[R] spsurvey analysis
Tim Howard
tghoward at gw.dec.state.ny.us
Fri Nov 1 19:57:34 CET 2013
Jason,
Thank you for your reply. Interesting ... so you think the 'classes' in the error message "The combined number of values in at least one class..." is referring to the CDF bins rather than the sub-population classes that I defined.
That makes sense as I only defined two classes (!). I was worried it was detecting and treating polygons as classes, somehow (ecoregions in example below).
I had already reached out to Kincaid and Olsen but had not received an answer yet so I moved on to R-help. I'll go back to them.
Thanks again.
Best,
Tim
>>> "Law, Jason" <Jason.Law at portlandoregon.gov> 11/1/2013 2:47 PM >>>
I use the spsurvey package a decent amount. The cont.cdftest function bins the cdf in order to perform the test which I think is the root of the problem. Unfortunately, the default is 3 which is the minimum number of bins.
I would contact Tom Kincaid or Tony Olsen at NHEERL WED directly to ask about this problem.
Another option would be to take a different analytical approach (e.g., a mixed effects model) which would allow you a lot more flexibility.
Jason Law
Statistician
City of Portland
Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason.law at portlandoregon.gov
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Tim Howard
Sent: Friday, November 01, 2013 7:49 AM
To: r-help at r-project.org
Subject: [R] spsurvey analysis
All,
I've used the excellent package, spsurvey, to create spatially balanced samples many times in the past. I'm now attempting to use the analysis portion of the package, which compares CDFs among sub-populations to test for differences in sub-population metrics.
- My data (count data) have many zeros, following a negative binomial or even zero-inflated negative binomial distribution.
- Samples are within polygons of varying sizes
- I want to test whether a sample at time 1 is different from a sample at time 2. Essentially the same sample areas and number of samples.
The problem:
- cont.cdftest throws a warning and does not complete for most (but not all) species sampled. Warning message: "The combined number of values in at least one class is less than five. Action: The user should consider using a smaller number of classes."
- There are plenty of samples in my two time periods (the dummy set below: Yr1=27, Yr2=31 non-zero values).
My Question:
Why is it throwing this error and is there a way to get around it?
Reproduceable example (change path to spsurvey sample data), requires us to use spsurvey to generate sample points:
### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis'
library(spsurvey)
### Analysis set up
setwd("C:/Program Files/R/R-3.0.2/library/spsurvey/doc")
att <- read.dbf("UT_ecoregions")
shp <- read.shape("UT_ecoregions")
set.seed(4447864)
# Create the design list
Stratdsgn <- list("Central Basin and Range"=list(panel=c(PanelOne=25), seltype="Equal"),
"Colorado Plateaus"=list(panel=c(PanelOne=25), seltype="Equal"),
"Mojave Basin and Range"=list(panel=c(PanelOne=10), seltype="Equal"),
"Northern Basin and Range"=list(panel=c(PanelOne=10), seltype="Equal"),
"Southern Rockies"=list(panel=c(PanelOne=14), seltype="Equal"),
"Wasatch and Uinta Mountains"=list(panel=c(PanelOne=10), seltype="Equal"),
"Wyoming Basin"=list(panel=c(PanelOne=6), seltype="Equal"))
# Select the sample design for each year
Stratsites_Yr1 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
type.frame="area", src.frame="sp.object",
sp.object=shp, att.frame=att, stratum="Level3_Nam", shapefile=FALSE)
Stratsites_Yr2 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
type.frame="area", src.frame="sp.object",
sp.object=shp, att.frame=att, stratum="Level3_Nam", shapefile=FALSE)
#extract the core information, add year as a grouping variable, add a plot ID to link with dummy data
Yr1 <- cbind(pltID = 1001:1100, Stratsites_Yr1 at data[,c(1,2,3,5)], grp = "Yr1")
Yr2 <- cbind(pltID = 2001:2100, Stratsites_Yr2 at data[,c(1,2,3,5)], grp = "Yr2")
sitedat <- rbind(Yr1, Yr2)
# create dummy sampling data. Lots of zeros!
bn.a <- rnbinom(size = 0.06, mu = 19.87, n=100) bn.b <- rnbinom(size = 0.06, mu = 20.15, n=100) dat.a <- data.frame(pltID = 1001:1100, grp = "Yr1",count = bn.a) dat.b <- data.frame(pltID = 2001:2100, grp = "Yr2",count = bn.b) dat <- rbind(dat.a, dat.b)
####
## Analysis begins here
####
data.cont <- data.frame(siteID = dat$pltID, Density=dat$count) sites <- data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat))) subpop <- data.frame(siteID = dat$pltID,
All_years=(rep("allYears",nrow(dat))),
Year = dat$grp)
design <- data.frame(siteID = sitedat$pltID,
wgt = sitedat$wgt,
xcoord = sitedat$xcoord,
ycoord = sitedat$ycoord)
framesize <- c("Yr1"=888081202000, "Yr2"=888081202000)
## There seem to be pretty good estimates CDF_Estimates <- cont.analysis(sites, subpop, design, data.cont,
popsize = list(All_years=sum(framesize),
Year = as.list(framesize)))
print(CDF_Estimates$Pct)
## this test fails
CDF_Tests <- cont.cdftest(sites, subpop[,c(1,3)], design, data.cont,
popsize=list(Year=as.list(framesize)))
warnprnt()
## how many records have values greater than zero, by year? Probably irrelevant!
notZero <- dat[dat$count > 0,]
table(notZero$grp)
### end
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
Thanks in advance.
Tim
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list