[R-sig-ME] MCMCglmm predictions with fixed effects Splines
Joshua Wiley
jwiley.psych at gmail.com
Wed Mar 27 08:10:35 CET 2013
I do not think the ifelse statements you have are what you want
either. Look at how this simple example is expanded with only three
knots:
> spl(1:3, k = 3)
[,1] [,2] [,3]
[1,] 3.80521304 -2.7748166 0.55521304
[2,] 0.09910059 0.1982012 0.09910059
[3,] 0.55521304 -2.7748166 3.80521304
simply cutting the data is not sufficient (FYI a more efficient way to
do that if you were would be with ?cut). Have you looked at the
source code for spl? It is short and readable. Essentially taking
the outer product of the data and knot arrays (determined as you
said), by cubing the absolute deviations, and then post multiplies
that using the singular value decomposition of it (X = UDV'), V
sqrt(D^-1) U'
Anyway, to make the new appropriate matrix for prediction, rather than
rolling your own, probably easiest to pass in your own knot values.
So:
knots <- quantile(1:10, 1:3/(3+1))
> knots
25% 50% 75%
3.25 5.50 7.75
## imaginary "real" use
> spl(1:10, knots = knots)
[,1] [,2] [,3]
[1,] 36.3243413 -26.4882371 5.3000313
[2,] 21.3916729 -8.9747525 1.6810714
[3,] 11.3613809 -0.1816752 0.1360608
[4,] 5.2276907 2.4526145 -0.2523869
[5,] 1.9051290 2.0996457 0.2879496
[6,] 0.2879496 2.0996457 1.9051290
[7,] -0.2523869 2.4526145 5.2276907
[8,] 0.1360608 -0.1816752 11.3613809
[9,] 1.6810714 -8.9747525 21.3916729
[10,] 5.3000313 -26.4882371 36.3243413
## now for the prediction data
> spl(1:3, knots = knots)
[,1] [,2] [,3]
[1,] 36.32434 -26.4882371 5.3000313
[2,] 21.39167 -8.9747525 1.6810714
[3,] 11.36138 -0.1816752 0.1360608
## what happens if you do not use the same knots
> spl(1:3, k=3)
[,1] [,2] [,3]
[1,] 3.80521304 -2.7748166 0.55521304
[2,] 0.09910059 0.1982012 0.09910059
[3,] 0.55521304 -2.7748166 3.80521304
Agree with Jarrod that a penalized spline model is more typical.
Since you're Bayesian, you could also shrink them using stronger
priors on those terms in the fixed effects region. Not really the
same thing, but could help reign in the effects if you want to leave
them as fixed effects.
Cheers,
Josh
On Tue, Mar 26, 2013 at 10:38 AM, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
> Hi Antonio,
>
> The penalised bit of a penalised spline is achieved by having the spline
> coefficients as random effects rather than fixed.
>
> Cheers,
>
> Jarrod
>
>
> Quoting "Antonio P. Ramos" <ramos.grad.student at gmail.com> on Tue, 26 Mar
> 2013 10:06:57 -0700:
>
>> Hi Jarrod,
>>
>> Thanks for your reply.
>>
>> I think I need a spline as a fixed effect only - time doesn't vary by
>> CASEID and all observations are subjects for the same time trends. I would
>> be nice to have a cubic spline though. Thanks
>>
>>
>> On Tue, Mar 26, 2013 at 2:38 AM, Jarrod Hadfield
>> <j.hadfield at ed.ac.uk>wrote:
>>
>>> Hi,
>>>
>>> Check out the examples in ?spl: you haven't fitted a penalised spline.
>>> You
>>> probably want something like:
>>>
>>> glm.MC.2 <- MCMCglmm(mortality.under.2 ~ maternal_age_c +
>>> I(maternal_age_c^2) + birth_year + residence + maternal_educ +
>>> sex + wealth, nitt=20000, thin=10, burnin=1000,random=
>>> ~idv(spl(birdth_year))+CASEID,
>>> prior=prior.2,data=rwanda2,**family='categorical',
>>> pr=TRUE)
>>>
>>> Note that I have saved the random effects (pr=TRUE) because the first k
>>> random effects are the spline coefficients. You will need to associate
>>> these with the relevant columns of Z (rather than X) to get predictions.
>>> Remember to include the fixed birth_year effect too.
>>>
>>> Cheers,
>>>
>>> Jarrod
>>>
>>>
>>>
>>> Quoting "Antonio P. Ramos" <ramos.grad.student at gmail.com> on Mon, 25 Mar
>>> 2013 19:57:51 -0700:
>>>
>>> I think I get what is going on a little better now. Does it make sense?
>>>>
>>>>
>>>> # getting the predictions
>>>> # where the nots are?
>>>> k <- 10
>>>> x <- quantile(rwanda2$birth_year, 1:k/(k + 1), na.rm = T)
>>>> y <- unique(rwanda2$birth_year)
>>>>
>>>> # creating new data for the poor
>>>> pred.data <- data.frame(maternal_age_c=rep(**0,28),wealth=rep("Lowest
>>>> quintile",28),
>>>>
>>>> sex=rep("Female",28),**residence=rep("Rural",28),
>>>> "spl(birth_year)1"=ifelse(y<**1976,1,0),
>>>> "spl(birth_year)2"=ifelse(y>**1975&y<1979,1,0),
>>>> "spl(birth_year)3"=ifelse(y>**1978&y<1981,1,0),
>>>> "spl(birth_year)4"=ifelse(y>**1980&y<1983,1,0),
>>>> "spl(birth_year)5"=ifelse(y>**1982&y<1985,1,0),
>>>> "spl(birth_year)6"=ifelse(y>=**1984&y<1988,1,0),
>>>>
>>>> "spl(birth_year)7"=ifelse(y>=**1987&y<=1990,1,0),
>>>> "spl(birth_year)8"=ifelse(y>**1989&y<1993,1,0),
>>>> "spl(birth_year)9"=ifelse(y>**1993&y<1996,1,0),
>>>> "spl(birth_year)10"=ifelse(y>**1995,1,0),
>>>> birth_order=rep(1,28),
>>>> maternal_educ=rep("No education",28))
>>>>
>>>> pred.data$wealth <- factor(pred.data$wealth,
>>>> levels=c("Lowest quintile", "Second
>>>> quintile","Middle quintile","Fourth quintile","Highest quintile"))
>>>> pred.data$sex <- factor(pred.data$sex, levels=c("Male","Female"))
>>>> pred.data$residence <- factor(pred.data$residence,**
>>>> levels=c("Rural","Urban"))
>>>> pred.data$maternal_educ <- factor(pred.data$maternal_**educ,
>>>> levels=c("No education", "Primary",
>>>> "Secondary","Higher"))
>>>>
>>>>
>>>> # design matrix
>>>> X <- model.matrix(~ maternal_age_c + I(maternal_age_c^2) + residence +
>>>> + maternal_educ +
>>>> birth_order + wealth +
>>>> spl.birth_year.1 +
>>>> spl.birth_year.2 +
>>>> spl.birth_year.3 +
>>>> spl.birth_year.4 +
>>>> spl.birth_year.5 +
>>>> spl.birth_year.6 +
>>>> spl.birth_year.7 +
>>>> spl.birth_year.8 +
>>>> spl.birth_year.9 +
>>>> spl.birth_year.10, data=pred.data)
>>>>
>>>>
>>>> V <- rowSums(glm.MC.2$VCV) # marginalizing over random effects
>>>> beta <- glm.MC.2$Sol # fixed effects
>>>> c2 <- (16*sqrt(3)/(15*pi))^2 # alterative parametrization for the
>>>> logistic
>>>> distribution
>>>> pred<- t(plogis(t(beta%*%t(X)/sqrt(1+**c2*V))))
>>>> pred <- as.data.frame(pred)
>>>> colnames(pred) <- 1970:1997 # predictions for the poor for every year
>>>> colSums(pred)
>>>> pred.poor <- pred
>>>>
>>>>
>>>>
>>>> On Mon, Mar 25, 2013 at 5:14 PM, Antonio P. Ramos <
>>>> ramos.grad.student at gmail.com> wrote:
>>>>
>>>> maybe the model's summary would also help:
>>>>>
>>>>>
>>>>> > summary(glm.MC.2)
>>>>>
>>>>> Iterations = 1001:19991
>>>>> Thinning interval = 10
>>>>> Sample size = 1900
>>>>>
>>>>> DIC: 23202.78
>>>>>
>>>>> G-structure: ~CASEID
>>>>>
>>>>> post.mean l-95% CI u-95% CI eff.samp
>>>>> CASEID 1.008 0.8508 1.139 73.88
>>>>>
>>>>> R-structure: ~units
>>>>>
>>>>> post.mean l-95% CI u-95% CI eff.samp
>>>>> units 1 1 1 0
>>>>>
>>>>> Location effects: mortality.under.2 ~ maternal_age_c +
>>>>> I(maternal_age_c^2) + spl(birth_year) + residence + maternal_educ + sex
>>>>> +
>>>>> wealth
>>>>>
>>>>> post.mean l-95% CI u-95% CI eff.samp
>>>>> pMCMC
>>>>> (Intercept) -2.2844882 -4.0378822 -0.5243228 270.8
>>>>> 0.00947 **
>>>>> maternal_age_c -0.0278874 -0.0396679 -0.0169772 409.5 <
>>>>> 5e-04 ***
>>>>> I(maternal_age_c^2) 0.0067512 0.0040534 0.0096366 369.2 <
>>>>> 5e-04 ***
>>>>> spl(birth_year)1 -0.0069841 -0.0172375 0.0024631 387.8
>>>>> 0.15789
>>>>> spl(birth_year)2 0.0257588 0.0003497 0.0511228 425.9
>>>>> 0.05474 .
>>>>> spl(birth_year)3 -0.0251424 -0.0871379 0.0381827 376.8
>>>>> 0.41368
>>>>> spl(birth_year)4 -0.0451816 -0.1068456 0.0188489 315.1
>>>>> 0.17895
>>>>> spl(birth_year)5 -0.0506369 -0.1256510 0.0256118 325.8
>>>>> 0.20737
>>>>> spl(birth_year)6 0.0571108 -0.0450529 0.1564138 229.7
>>>>> 0.25368
>>>>> spl(birth_year)7 -0.0993668 -0.1830175 -0.0135886 275.6
>>>>> 0.01474 *
>>>>> spl(birth_year)8 -0.0812237 -0.1417047 -0.0322527 273.6
>>>>> 0.00316 **
>>>>> spl(birth_year)9 0.0626604 0.0448635 0.0797381 300.8 <
>>>>> 5e-04 ***
>>>>> spl(birth_year)10 -0.0148629 -0.0234434 -0.0054245 387.5
>>>>> 0.00105 **
>>>>> residenceUrban -0.2141484 -0.3716601 -0.0511390 280.1
>>>>> 0.00947 **
>>>>> maternal_educNo education 1.0334332 0.1228220 2.0524150 207.8
>>>>> 0.03263 *
>>>>> maternal_educPrimary 0.7954471 -0.1554284 1.7616937 206.3
>>>>> 0.10316
>>>>> maternal_educSecondary -0.0083354 -0.9622503 1.0182426 195.0
>>>>> 0.98000
>>>>> sexMale 0.1893753 0.1072094 0.2710264 275.2 <
>>>>> 5e-04 ***
>>>>> wealthSecond quintile 0.1773074 0.0444755 0.3283123 398.4
>>>>> 0.01368 *
>>>>> wealthMiddle quintile -0.0667648 -0.2060366 0.0676636 376.9
>>>>> 0.34316
>>>>> wealthFourth quintile 0.0485797 -0.0913787 0.1942995 416.2
>>>>> 0.47579
>>>>> wealthHighest quintile -0.1339028 -0.3017483 0.0077947 338.2
>>>>> 0.08000 .
>>>>> ---
>>>>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 25, 2013 at 5:12 PM, Antonio P. Ramos <
>>>>> ramos.grad.student at gmail.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>>
>>>>>>
>>>>>> I am trying to get some predictions from a MCMCglmm model but it is
>>>>>> not
>>>>>> working. I guess I don't really following what the model is doing with
>>>>>> tje
>>>>>> spl() command. Here is an example of the issue.
>>>>>>
>>>>>> Thanks a bunch
>>>>>>
>>>>>>
>>>>>>
>>>>>> # inve.wishart(V=1,nu=4) is equivalent to inv-gamma(shape=2,scale=2)
>>>>>> for
>>>>>> mothers random effects
>>>>>> prior.2 <- list(R = list(V = 1, fix = 1), G = list(G1 = list(V = 1,nu
>>>>>> =
>>>>>> 4)))
>>>>>> glm.MC.2 <- MCMCglmm(mortality.under.2 ~ maternal_age_c +
>>>>>> I(maternal_age_c^2) +
>>>>>> spl(birth_year) + residence + maternal_educ +
>>>>>> sex + wealth,
>>>>>> nitt=20000, thin=10, burnin=1000,
>>>>>> random= ~CASEID, prior=prior.2,data=rwanda2,
>>>>>> family='categorical')
>>>>>>
>>>>>>
>>>>>> > # creating new data for the poor
>>>>>> > pred.data <- data.frame(maternal_age_c=rep(**
>>>>>> 18,25),wealth=rep("Lowest
>>>>>> quintile",25),
>>>>>> + sex=rep("Female",25),**
>>>>>> residence=rep("Rural",25),
>>>>>> + "spl(birth_year)1"=1971:1995,
>>>>>> + "spl(birth_year)2"=1971:1995,
>>>>>> + "spl(birth_year)3"=1971:1995,
>>>>>> + "spl(birth_year)4"=1971:1995,
>>>>>> + "spl(birth_year)5"=1971:1995,
>>>>>> + "spl(birth_year)6"=1971:1995,
>>>>>> + "spl(birth_year)7"=1971:1995,
>>>>>> + "spl(birth_year)8"=1971:1995,
>>>>>> + "spl(birth_year)9"=1971:1995,
>>>>>> + "spl(birth_year)10"=1971:1995,
>>>>>> + birth_order=rep(1,25),
>>>>>> + maternal_educ=rep("No education",25))
>>>>>> >
>>>>>> > pred.data$wealth <- factor(pred.data$wealth,
>>>>>> + levels=c("Lowest quintile", "Second
>>>>>> quintile","Middle quintile","Fourth quintile","Highest quintile"))
>>>>>> > pred.data$sex <- factor(pred.data$sex, levels=c("Male","Female"))
>>>>>> > pred.data$residence <-
>>>>>> factor(pred.data$residence,**levels=c("Rural","Urban"))
>>>>>> > # pred.data$birth_year <- factor(pred.data$birth_year,
>>>>>> levels=1970:1997)
>>>>>> > pred.data$maternal_educ <- factor(pred.data$maternal_**educ,
>>>>>> + levels=c("No education",
>>>>>> "Primary",
>>>>>> "Secondary","Higher"))
>>>>>> >
>>>>>> >
>>>>>> > # design matrix
>>>>>> > X <- model.matrix(~ maternal_age_c + I(maternal_age_c^2) + residence
>>>>>> > +
>>>>>> + + maternal_educ +
>>>>>> + birth_order + wealth +
>>>>>> + spl.birth_year.1 +
>>>>>> + spl.birth_year.2 +
>>>>>> + spl.birth_year.3 +
>>>>>> + spl.birth_year.4 +
>>>>>> + spl.birth_year.5 +
>>>>>> + spl.birth_year.6 +
>>>>>> + spl.birth_year.7 +
>>>>>> + spl.birth_year.8 +
>>>>>> + spl.birth_year.9 +
>>>>>> + spl.birth_year.10, data=pred.data)
>>>>>> >
>>>>>> >
>>>>>> > V <- rowSums(glm.MC.2$VCV) # marginalizing over random effects
>>>>>> > beta <- glm.MC.2$Sol # fixed effects
>>>>>> > c2 <- (16*sqrt(3)/(15*pi))^2 # alterative parametrization for the
>>>>>> logistic distribution
>>>>>> > pred<- t(plogis(t(beta%*%t(X)/sqrt(1+**c2*V))))
>>>>>> > pred <- as.data.frame(pred)
>>>>>> > colnames(pred) <- 1971:1995 # predictions for the poor for every
>>>>>> > year
>>>>>> > colSums(pred)
>>>>>> 1971 1972 1973 1974 1975 1976 1977
>>>>>> 1978
>>>>>> 1979 1980 1981 1982 1983
>>>>>> 1677.094 1677.093 1677.093 1677.093 1677.093 1677.093 1677.093
>>>>>> 1677.093
>>>>>> 1677.093 1677.093 1677.093 1677.093 1677.093
>>>>>> 1984 1985 1986 1987 1988 1989 1990
>>>>>> 1991
>>>>>> 1992 1993 1994 1995
>>>>>> 1677.092 1677.092 1677.092 1677.092 1677.092 1677.092 1677.092
>>>>>> 1677.092
>>>>>> 1677.092 1677.092 1677.092 1677.092
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com
More information about the R-sig-mixed-models
mailing list