[R] Survey Design / Rake questions
Farley, Robert
FarleyR at metro.net
Fri Aug 29 03:04:13 CEST 2008
I'm feeling like I just don't get it. My attempt at rake now fails
with:
Error in postStratify.survey.design(design, strata[[i]],
population.margins[[i]], :
Stratifying variables don't match
The factors in the data frame looks fine. Should I have the same
structure in the design?
> str(EBDesign$lineon)
NULL
> str(EBSurvey$lineon)
Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
> str(ByEBOn$StnName)
Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
> all(levels(EBSurvey$lineon)==StnName)
[1] TRUE
> #
> str(EBDesign$NumStn)
NULL
> str(EBSurvey$NumStn)
Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
> str(ByEBNum$StnTraveld)
Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
> all(levels(EBSurvey$NumStn)==StnTraveld)
[1] TRUE
A complete listing is below:
**************************************************
**************************************************
**************************************************
> sessionInfo() # List loaded packages
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] graphics grDevices utils datasets stats methods base
other attached packages:
[1] survey_3.8-1 fortunes_1.3-5 moonsun_0.1 prettyR_1.3-2
foreign_0.8-29
> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav",
use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
>
#=======================================================================
========
> temp <- sub(' +$', '', SurveyData$direction_)
> SurveyData$direction_ <- temp
>
#=======================================================================
========
> # Calc. # stations traversed from StnOn/StnOff
>
SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyDat
a$lineoff))
> #################################################### Kludge
> mean(SurveyData$NumStn)
[1] 6.785276
> SurveyData$NumStn <- pmax(1,SurveyData$NumStn)
> mean(SurveyData$NumStn)
[1] 6.789877
> ####################################################
> SurveyData$NumStn <- as.factor(SurveyData$NumStn)
>
#=======================================================================
========
> # Adjust one direction at a time. Start W/ EB {learn subsetting
later}
> EBSurvey <- subset(SurveyData, direction_ == "EASTBOUND" )
> EBDesign <- svydesign(id=~sampn, weights=~expwgt, data=EBSurvey)
>
#=======================================================================
========
> # New Marignals {start w/ 2 dimensions: StnOn X Distance}
> StnName <- as.factor(c( "Warner Center", "De Soto", "Pierce College",
"Tampa", "Reseda", "Balboa", "Woodley", "Sepulveda", "Van Nuys",
"Woodman", "Valley College", "Laurel Canyon", "North Hollywood"))
> EBOnNewTots <- c( 1000, 600, 1200,
500, 1000, 500, 200, 250, 1000, 300,
100, 123.65, 0 )
> ByEBOn <- data.frame(StnName, Freq=EBOnNewTots)
> #
> StnTraveld <- as.factor(1:12)
> EBNumStn <- c(673.65, 800, 1000, 1000, 800, 700, 600, 500,
400, 200, 50, 50 )
> ByEBNum <- data.frame(StnTraveld, Freq=EBNumStn)
> #
> RakedEBSurvey <- rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn,
ByEBNum) )
Error in postStratify.survey.design(design, strata[[i]],
population.margins[[i]], :
Stratifying variables don't match
> #
> str(EBDesign$lineon)
NULL
> str(EBSurvey$lineon)
Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
> str(ByEBOn$StnName)
Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
> all(levels(EBSurvey$lineon)==StnName)
[1] TRUE
> #
> str(EBDesign$NumStn)
NULL
> str(EBSurvey$NumStn)
Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
> str(ByEBNum$StnTraveld)
Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
> all(levels(EBSurvey$NumStn)==StnTraveld)
[1] TRUE
> #
**************************************************
**************************************************
**************************************************
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: Thomas Lumley [mailto:tlumley at u.washington.edu]
Sent: Thursday, August 28, 2008 11:43
To: Farley, Robert
Cc: r-help at r-project.org
Subject: Re: [R] Survey Design / Rake questions
On Mon, 25 Aug 2008, Farley, Robert wrote:
> I see a number of things that bother me.
> 1) str(ByEBNum$StnTraveld) says "int [1:12] 1 2 3 4 5 6 7 8 9 10 ..."
> Even though "StnTraveld <- c(as.factor(1:12))"
You don't want the c()
> a<-as.factor(1:12)
> str(a)
Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(c(a))
int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
As the help for c() says "all attributes except names are removed.",
which includes the factor levels.
> 2) ByEBOn$StnName[1:5] seems to imply I have extra spaces in the
data. Where would they have come from?
No, that's just R printing things in columns
> a<-factor(1:12, labels=c(1:11,"antidisestablishmentarianism"))
> a
[1] 1 2
[3] 3 4
[5] 5 6
[7] 7 8
[9] 9 10
[11] 11 antidisestablishmentarianism
Levels: 1 2 3 4 5 6 7 8 9 10 11 antidisestablishmentarianism
> 3) I'd like to verify that the order (value) of "EBSurvey$lineon"
> matches my definition in "StnName"
all(levels(EBSurvey$lineon)==StnName)
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list