[R] [EXTERNAL] Re: bug in Windows implementation of nlme::groupedData
John Fox
j|ox @end|ng |rom mcm@@ter@c@
Sat Jan 8 02:17:34 CET 2022
Dear Melissa,
Normally, in evaluating a formula an R modeling function follows the
scoping rules in ?formula; that is,
"A formula object has an associated environment, and this environment
(rather than the parent environment) is used by model.frame to evaluate
variables that are not found in the supplied data argument.
"Formulas created with the ~ operator use the environment in which they
were created. Formulas created with as.formula will use the env argument
for their environment."
So, for example, if the variables in the formula live in the environment
of the calling function and if the data argument isn't used, then the
variables should be found. Modifying your example a bit and calling
lme() works, for example:
------- snip -------
> analyze_this <- function(df) {
+
+ mean.x <- mean(df$age)
+ mean.y <- mean(df$height)
+ sd.x <- sd(df$age)
+ sd.y <- sd(df$height)
+
+ x <- (df$age - mean.x) / sd.x
+ y <- (df$height - mean.y) / sd.y
+ X <- model.matrix(~ x * male * black, data = df)
+ dummyID <- rep(1:2, times=c(floor(nrow(X)/2), ceiling(nrow(X)/2)))
+
+ lme(y ~ X[, -1], random= ~ 1 | dummyID)
+
+ # groupedData(y ~ X[, -1] | dummyID)
+
+ }
> analyze_this(growthIndiana)
Linear mixed-effects model fit by REML
Data: NULL
Log-restricted-likelihood: -2546.266
Fixed: y ~ X[, -1]
(Intercept) X[, -1]x X[, -1]male X[,
-1]black
-0.16086555 0.73401464 0.26303561
-0.04761425
X[, -1]x:male X[, -1]x:black X[, -1]male:black X[,
-1]x:male:black
0.27517924 -0.10318100 0.21899350
0.03048160
Random effects:
Formula: ~1 | dummyID
(Intercept) Residual
StdDev: 3.688461e-05 0.4462984
Number of Observations: 4123
Number of Groups: 2
------- snip -------
Note that model.matrix() finds x and y in the environment of
analyze_this(), and male and black in df.
But if you unquote the line groupedData(y ~ X[, -1] | dummyID), the
function fails:
------- snip -------
> analyze_this(growthIndiana)
Error in data.frame(y = y, X = X, dummyID = dummyID) :
object 'X' not found
------- snip -------
This suggests that groupedData() is doing something unusual (which I
don't have the inclination to figure out).
I'm not sure why one needs to manipulate the model matrix directly like
this, but I assume that there is some coherent reason or you wouldn't be
asking. Also isn't the formula for groupedData() supposed to have a
*single* covariate on the right, like y ~ x | g (where y, x, and g are
individual variables)?
Best,
John
On 2022-01-07 5:29 p.m., Key, Melissa wrote:
> John,
>
> Thanks for your response. I agree that the definition of the data frame is poor (in my defense it came directly from the demo code, but I should have checked it more thoroughly). The good news is that your comments caused me to take a closer look at where X was defined, and I found the reason I wasn't getting the same results on my Mac and PC - that error was between keyboard and chair.
>
> There is still something funny going on though (at least relative to my previous experience with how R searches environments:
>
> If X is defined in the global environment, groupedData can find it there and use it. (this is what I'm used to)
> If X is defined within a function, groupedData cannot find it, even if groupedData is called within the same function. (this seems strange to me - usually parent.frame() captures information within the function environment, or so I thought)
>
> My solution at the bottom still works - and unlike groupedData, nlme allows a list as input to the data argument (or at least, doesn't check to make sure it's a data frame), so I have a working (albeit hacky) solution that actually makes more sense to me than using groupedData, but it still seems strange that the function cannot find X in its search path.
>
> Thanks again!
> Melissa
>
> -----Original Message-----
> From: John Fox <jfox using mcmaster.ca>
> Sent: Friday, January 7, 2022 4:35 PM
> To: Key, Melissa <mkey using infoscitex.com>
> Cc: r-help using r-project.org
> Subject: [EXTERNAL] Re: [R] bug in Windows implementation of nlme::groupedData
>
> Dear Melissa,
>
> It seems strange to me that your code would work on any platform (it doesn't on my Mac) because the data frame you create shouldn't contain a matrix named "X" but rather columns including those originating from X.
> To illustrate:
>
> > X <- matrix(1:12, 4, 3)
> > colnames(X) <- c("a", "b", "c")
> > X
> a b c
> [1,] 1 5 9
> [2,] 2 6 10
> [3,] 3 7 11
> [4,] 4 8 12
>
> > y <- 1:4
>
> > (D <- data.frame(y, X))
> y a b c
> 1 1 1 5 9
> 2 2 2 6 10
> 3 3 3 7 11
> 4 4 4 8 12
>
> > str(D)
> 'data.frame': 4 obs. of 4 variables:
> $ y: int 1 2 3 4
> $ a: int 1 2 3 4
> $ b: int 5 6 7 8
> $ c: int 9 10 11 12
>
> My session info:
>
> > sessionInfo()
> R version 4.1.2 (2021-11-01)
> Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.1
>
> Matrix products: default
> LAPACK:
> /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] nlme_3.1-153 HRW_1.0-5
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.2 tools_4.1.2 KernSmooth_2.23-20
> splines_4.1.2
> [5] grid_4.1.2 lattice_0.20-45
>
> I hope this helps,
> John
>
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fsocialsciences.mcmaster.ca%2Fjfox%2F&data=04%7C01%7Cmkey%40infoscitex.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e985038b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=nTXsf6vbxVIstMup5AUlmABv9CdQa3hw44Fdb2lej7A%3D&reserved=0
>
> On 2022-01-07 11:23 a.m., Key, Melissa wrote:
>> I am trying to replicate a semi-parametric analysis described in Harezlak, Jaroslaw, David Ruppert, and Matt P. Wand. Semiparametric regression with R. New York, NY: Springer, 2018. (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Flink.springer.com%2Fbook%2F10.1007%252F978-1-4939-8853-2&data=04%7C01%7Cmkey%40infoscitex.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e985038b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ESCv%2Bp%2FPo7pyKm055Y4nYAK8Cj1RTkorP9ZvahZTPmE%3D&reserved=0).
>>
>> I can successfully run the analysis, but now I'm trying to move it into my workflow, which requires that the analysis be conducted within a function (using targets package), and the `groupedData` function now fails with an error that it cannot find the `X` matrix (see reprex below). I've tried the reprex on both my personal Mac (where it works??) and on windows machines (where it does not) - so the problem is likely specific to Windows computers (yes, this seems weird to me too).
>> All packages have been updated, and I'm running the latest version of R on all machines.
>>
>> Reprex:
>>
>> library(HRW) # contains example data and ZOSull function
>> library(nlme)
>>
>> data(growthIndiana)
>>
>>
>> analyze_this <- function(df) {
>>
>> mean.x <- mean(df$age)
>> mean.y <- mean(df$height)
>> sd.x <- sd(df$age)
>> sd.y <- sd(df$height)
>>
>> df$x <- (df$age - mean.x) / sd.x
>> df$y <- (df$height - mean.y) / sd.y
>>
>> X <- model.matrix(~ x * male * black, data = df)
>> dummyID <- rep(1, length(nrow(X)))
>>
>> grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)),
>> data = data.frame(y = df$y, X, dummyID))
>>
>> }
>>
>>
>> # doesn't work on Windows machine, does work on the Mac
>> analyze_this(growthIndiana)
>> #> Error in eval(aux[[2]], object): object 'X' not found
>>
>> # does work
>>
>> df <- growthIndiana
>>
>> mean.x <- mean(df$age)
>> mean.y <- mean(df$height)
>> sd.x <- sd(df$age)
>> sd.y <- sd(df$height)
>>
>> df$x <- (df$age - mean.x) / sd.x
>> df$y <- (df$height - mean.y) / sd.y
>>
>> X <- model.matrix(~ x * male * black, data = df) dummyID <- rep(1,
>> length(nrow(X)))
>>
>> grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)), data
>> = data.frame(y = df$y, X, dummyID))
>>
>>
>> # attempted work-around.
>>
>> analyze_this2 <- function(df) {
>> num.global.knots = 20
>> num.subject.knots = 10
>>
>> mean.x <- mean(df$age)
>> mean.y <- mean(df$height)
>> sd.x <- sd(df$age)
>> sd.y <- sd(df$height)
>>
>> df$x <- (df$age - mean.x) / sd.x
>> df$y <- (df$height - mean.y) / sd.y
>>
>> X <- model.matrix(~ x * male * black, data = df)
>> dummyID <- rep(1, length(nrow(X)))
>>
>> # grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)),
>> data = data.frame(y = df$y, X, dummyID))
>>
>> global.knots = quantile(unique(df$x), seq(0, 1, length = num.global.knots + 2)[-c(1, num.global.knots + 2)])
>> subject.knots = quantile(unique(df$x), seq(0, 1, length =
>> num.subject.knots + 2)[-c(1, num.subject.knots + 2)])
>>
>> Z.global <- ZOSull(df$x, range.x = range(df$x), global.knots)
>> Z.group <- df$black * Z.global
>> Z.subject <- ZOSull(df$x, range.x = range(df$x), subject.knots)
>>
>> Zblock <- list(
>> dummyID = pdIdent(~ 0 + Z.global),
>> dummyID = pdIdent(~ 0 + Z.group),
>> idnum = pdSymm(~ x),
>> idnum = pdIdent(~ 0 + Z.subject)
>> )
>>
>> df$dummyID <- dummyID
>> tmp_data <- c(
>> df,
>> X = list(X),
>> Z.global = list(Z.global),
>> Z.group = list(Z.global),
>> Z.subject = list(Z.subject)
>> )
>>
>> fit <- lme(y ~ 0 + X,
>> data = tmp_data,
>> random = Zblock
>> )
>>
>> }
>>
>> # this works (warning - lme takes awhile to fit)
>> analyze_this2(growthIndiana)
>>
>> sessionInfo()
>> #> R version 4.1.2 (2021-11-01)
>> #> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows
>> 10 x64 (build 22000) #> #> Matrix products: default #> #> locale:
>> #> [1] LC_COLLATE=English_United States.1252 #> [2]
>> LC_CTYPE=English_United States.1252 #> [3] LC_MONETARY=English_United
>> States.1252 #> [4] LC_NUMERIC=C #> [5] LC_TIME=English_United
>> States.1252 #> #> attached base packages:
>> #> [1] stats graphics grDevices utils datasets methods base
>> #>
>> #> other attached packages:
>> #> [1] nlme_3.1-153 HRW_1.0-5
>> #>
>> #> loaded via a namespace (and not attached):
>> #> [1] lattice_0.20-45 digest_0.6.29 withr_2.4.3 grid_4.1.2
>> #> [5] magrittr_2.0.1 reprex_2.0.1 evaluate_0.14 KernSmooth_2.23-20
>> #> [9] highr_0.9 stringi_1.7.6 rlang_0.4.12 cli_3.1.0
>> #> [13] rstudioapi_0.13 fs_1.5.2 rmarkdown_2.11 splines_4.1.2
>> #> [17] tools_4.1.2 stringr_1.4.0 glue_1.6.0 xfun_0.29
>> #> [21] yaml_2.2.1 fastmap_1.1.0 compiler_4.1.2 htmltools_0.5.2
>> #> [25] knitr_1.37
>>
>> Created on 2022-01-07 by the [reprex
>> package](https://usg02.safelinks.protection.office365.us/?url=https%3A
>> %2F%2Freprex.tidyverse.org%2F&data=04%7C01%7Cmkey%40infoscitex.com
>> %7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e985038b58
>> %7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=hH
>> 5ncnSVZGQ4KZsvLpvjgM%2Fgf9Zu5QnoeSntTrCZWjk%3D&reserved=0)
>> (v2.0.1)
>>
>> Here's the session Info for the Mac:
>>
>> sessionInfo()
>> R version 4.1.2 (2021-11-01)
>> Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS
>> Monterey 12.1
>>
>> Matrix products: default
>> LAPACK:
>> /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRl
>> apack.dylib
>>
>> locale:[1]
>> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] nlme_3.1-153 targets_0.8.1 HRW_1.0-5
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_4.1.2 pillar_1.6.4 tools_4.1.2 digest_0.6.28 lattice_0.20-45 jsonlite_1.7.2 evaluate_0.14
>> [8] lifecycle_1.0.1 tibble_3.1.6 pkgconfig_2.0.3 rlang_0.4.12 igraph_1.2.9 cli_3.1.0 yaml_2.2.1
>> [15] xfun_0.28 fastmap_1.1.0 withr_2.4.2 knitr_1.36 htmlwidgets_1.5.4 vctrs_0.3.8 grid_4.1.2
>> [22] tidyselect_1.1.1 glue_1.5.0 data.table_1.14.2 R6_2.5.1 processx_3.5.2 fansi_0.5.0 rmarkdown_2.11
>> [29] callr_3.7.0 purrr_0.3.4 magrittr_2.0.1 scales_1.1.1 codetools_0.2-18 ps_1.6.0 htmltools_0.5.2
>> [36] ellipsis_0.3.2 splines_4.1.2 colorspace_2.0-2 KernSmooth_2.23-20 utf8_1.2.2 visNetwork_2.1.0 munsell_0.5.0
>> [43] crayon_1.4.2
>>
>>
>> Melissa Key, Ph.D.
>> Statistician
>> Non-Invasive Brain Stimulation Team
>> Infoscitex
>>
>> Cell: 937-550-4981
>> Email: mkey using infoscitex.com<mailto:mkey using infoscitex.com>
>> Base email:
>> melissa.key.ctr using us.af.mil<mailto:melissa.key.ctr using us.af.mil>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fsta
>> t.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7Cmkey%40info
>> scitex.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690
>> e985038b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIj
>> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&am
>> p;sdata=4c%2Bnu06ZfIKpwowJCTz2mIl7kAEQR5vbZSw8pTuqumk%3D&reserved=
>> 0 PLEASE do read the posting guide
>> https://usg02.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.
>> r-project.org%2Fposting-guide.html&data=04%7C01%7Cmkey%40infoscite
>> x.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e9850
>> 38b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
>> wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sda
>> ta=iT8fu9JX5i7gUfo0T2gQMVdHEm%2FSThakp2yaOJ2pOBQ%3D&reserved=0
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list