[R-sig-ME] Bootstrap the variance difference

Joshua Wiley jwiley.psych at gmail.com
Mon Feb 6 23:08:35 CET 2012


Hi Kevin,

I worry a bit that your model is correctly specified and you are doing
what you really want to be doing, but that is a bit of a different
question (perhaps worth checking).  In any case, your models did not
run on my system with your example data.  Supposing all works well in
the full data and you are confident with what you have, then this is
one approach to dealing with the boot issue:

rather than pass two data sets, pass one wide data set in.  Use
slightly different names and pass in two formulae to use the correct
variables.

Hope this helps,

Josh

P.S. If real data is large, this may be somewhat slow and would easily
benefit from parallelizing if you have multiple cores and sufficient
memory----check out the parallel option to boot().

#####################################################

dat1 <- read.csv(file.choose()) ## final
dat2 <- read.csv(file.choose()) ## initial

colnames(dat1) <- paste("f", colnames(dat1), sep = '')
colnames(dat2) <- paste("i", colnames(dat2), sep = '')

dat <- cbind(dat2, dat1)

varcomp <- function (lformula, dat, indices) {
  d <- dat[indices, ]
  fit1 <- lmer(lformula[[1]], data=d) #linear model
  fit2 <- lmer(lformula[[2]], data=d) #linear model
  a <- (attr (VarCorr(fit1), "sc")^2) #output variance estimation
  b <- (attr (VarCorr(fit2), "sc")^2) #output variance estimation
  drv <- (a - b) #difference between the variance estimations
  return(drv)
}

require(lme4)
require(boot)

system.time(ip1.boot <- boot (data = dat, statistic = varcomp, R =
100, lformula = list(
  initial = iCNPC ~ (1 | iCell.line) + (1 | iDNA.extract) + iCell.line,
  final = fCNPC ~ (1 | fCell.line) + (1 | fDNA.extract) + fCell.line)))



On Mon, Feb 6, 2012 at 1:32 PM, Kevin Spring <kevinjspring at gmail.com> wrote:
>  I asked this question on Stack Exchange, but I think it might be too
> specialized.  Hopefully someone in the mixed model group can help me out.
>
> I want to be able to bootstrap the variance differences between two data
> sets obtained at different times while taking out the error in a random
> effect.
>
> I have 2 sets of experimental data, where the data was measured at 2 time
> points (initial and final). I also have a set of simulation data. I want to
> compare the variance of the simulated date with the variance difference
> between the experimental data (final - initial). The idea is to get
> confidence intervals from the bootstrap to compare the experimental data
> with the simulation.
>
> I am having trouble making the statistic for the bootstrap function in the
> boot package for R. So far I have.
>
> varcomp <- function ( formula, data, indices ) {
>    d <- data[indices,] #sample for boot
>    fit <- lmer(formula, data=d) #linear model
>    res.var = (attr (VarCorr(fit), "sc")^2) # variance estimation
>    return(res.var)
>    }
>
> But this function only returns the variance of a single data set. I want to
> be able to input 2 sets of data and have it return the difference between
> the two data sets' variance.
>
> When I try something like:
>
> varcomp <- function ( formula, data1, data2, indices ) {
> d1 <- data1[indices,] #sample for boot
> d2 <- data2[indices,] #sample for boot
> fit1 <- lmer(formula, data=d1) #linear model
> fit2 <- lmer(formula, data=d2) #linear model
> a = (attr (VarCorr(fit1), "sc")^2) #output variance estimation
> b = (attr (VarCorr(fit2), "sc")^2) #output variance estimation
> drv = a - b #difference between the variance estimations
> return(drv)
> }
>
> I would then put it into boot such as:
>
> ip1.boot <- boot ( data = ip1, statistic=varcomp, R=100,
> formula=CNPC~(1|Cell.line:DNA.extract)+Cell.line)
>
> I can't do it this way because the boot function only allows for one data
> set to be inputted.
>
> *Does anyone know how to create the correct statistic function for this?*
>
> An example of the data can also be downloaded
> here<http://www.mediafire.com/file/68a3ro1cfneiy2r/data.zip.zip>(2 csv
> files zipped 1.22KB.)
>
> My data looks something like the following:
>
> Initial
>
>       Cell.line    Time DNA.extract   Gene      CNPC
> 1          9 initial           1 atubP1 1778.4589
> 2          9 initial           1 atubP1 2108.0552
> 3          9 initial           1 atubP1 2118.6725
> 4          9 initial           2 atubP1 2018.6593
> 5          9 initial           2 atubP1 1935.9008
> 6          9 initial           2 atubP1 1749.9158
> 7          9 initial           3 atubP1 1524.7475
> 8          9 initial           3 atubP1 1532.9781
> 9          9 initial           3 atubP1 1693.3098
> 10        17 initial           1 atubP1 1076.4720
> 11        17 initial           1 atubP1 1101.3315
> 12        17 initial           1 atubP1 1185.3606
> 13        17 initial           2 atubP1 1131.1118
> 14        17 initial           2 atubP1  892.7087
> 15        17 initial           2 atubP1 1028.5465
> 16        17 initial           3 atubP1  887.9972
> 17        17 initial           3 atubP1  732.9646
> 18        17 initial           3 atubP1  680.6724
>
> Final
>
>   Cell.line  Time DNA.extract   Gene      CNPC
> 1          9 final           1 atubP1 1262.2378
> 2          9 final           1 atubP1 1261.9858
> 3          9 final           1 atubP1 1390.6873
> 4          9 final           2 atubP1 1539.7180
> 5          9 final           2 atubP1 1510.5405
> 6          9 final           2 atubP1 1443.1767
> 7          9 final           3 atubP1 1456.2050
> 8          9 final           3 atubP1 1578.6396
> 9          9 final           3 atubP1 1656.1822
> 10        17 final           1 atubP1 1462.5179
> 11        17 final           1 atubP1 1580.9956
> 12        17 final           1 atubP1 1255.9020
> 13        17 final           2 atubP1  886.7579
> 14        17 final           2 atubP1  581.8116
> 15        17 final           2 atubP1  722.0526
> 16        17 final           3 atubP1 4168.7895
> 17        17 final           3 atubP1 3266.2105
> 18        17 final           3 atubP1 4219.5645
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/




More information about the R-sig-mixed-models mailing list