[R-sig-ME] lmer function

Sun Aug 1 23:23:43 CEST 2021

  Please reply to the list!

  I think the warnings are an edge-case/false positive.
  When reading a character vector R doesn't convert "NA" to a 
not-available (NA) value (because people might have character vectors 
denoting, say, North America (NA) or Nabisco (NA)).  These provoke a 
warning when converting.  Try this function instead to convert:

as_num <- function(x) {
     x <- as.character(x)
     x[!is.na(x) & (x=="NA")] <- NA_character_
     as.numeric(x)
}

On 8/1/21 5:00 PM, mina jahan wrote:
> Dear Ben,
> Thank you for your reply.
> age and bmi are quantitative variables.
> How can I define them as numeric variables?
> I used as.numeric, but I got warning message:NAs introduced by coercion
> 
> cheers
> Mina
> 
> On Mon, 2 Aug 2021 at 01:23, Ben Bolker <bbolker using gmail.com 
> <mailto:bbolker using gmail.com>> wrote:
> 
>          Since you made your data available to me via google drive, I was
>     able to figure out the problem; I wouldn't have been able to if you
>     hadn't shared the data, although if you had presented the results of
>     summary(B) or str(B) I (or someone) would probably have been able to
>     diagnose the issue.
> 
>         The problem is that your 'bmi' variable is of type *character*;
>     that
>     means that when building the model matrix for the fixed-effect part of
>     the model, we end up treating it as a categorical variable and
>     trying to
>     build a model matrix that is of size
> 
>     8.0*nr*((n_bmi-1)+(n_age-1) + 1 + 2)/2^30
> 
>     in GB (n_age, n_bmi are the number of unique values of bmi and age);
>     this comes out to about 30.5 Gb.  On my machine it fails immediately by
>     running out of memory; the proximal problem you are probably having is
>     when the program tries to compute the rank of the matrix to check for
>     muticollinearity.  In any case, though, you probably *don't* want to
>     fit
>     a model with 42,000 fixed parameters ...
> 
>         If we address this problem by converting age and bmi to numeric, or
>     by using readr::read_csv() to read in the file in the first place,
>     everything works. (Note usual cautions about applying as.numeric()
>     directly to a *factor*; in this case (with R>4.0) we are starting with
>     type character, so it's OK.
> 
> 
>        (to my surprise fread::data.table() also mis-categorizes these
>     columns.  I haven't been able to figure out why read.csv() and
>     (especially) fread() get fooled ...  the usual reasons (non-numeric
>     entries, large numbers of NAs at the start of the column, etc.) don't
>     seem to be present.
> 
>         cheers
>           Ben Bolker
> 
> 
>     On 7/31/21 6:09 PM, mina jahan wrote:
>      >
>      >   I have a data set containing 20 imputed data. I want to use the
>     lmer
>      > function for computing regression coefficients for each
>     imputation. But
>      > I was exposed with under error:
>      > Error in qr.default(X, tol = tol, LAPACK = FALSE) :
>      >    too large a matrix for LINPACK
>      > I can not understand this error.  I think that this issue is
>     related to
>      > the optimization algorithms used for inference.
>      > R code is as follows:
>      > library(lme4)
>      >
>     B<-read.csv("C:/Users/USER/Desktop/micemd2.csv",header=TRUE,na.string="")
>      > names(B)
>      > names(B)[names(B) == ".imp"] <- "imp"
>      > B<-B[ , -2]
>      > names(B)
>      > B<-B[which(B$imp!=0),]
>      > head(B)
>      > tail(B)
>      >
>      > ###################split dataset by imp
>      > list_df <- split(B, B$imp)
>      >
>      > ###################Coeficient for each imputation of lmer
>      > result1_df <- as.data.frame(matrix(ncol=5,nrow=length(list_df)))
>     # make
>      > an empty dataframe
>      > colnames(result1_df)<-c("intercept","age","sex","bmi","time")
>     #give the
>      > dataframe column names
>      > for (i in 1:length(list_df)){ #run a loop over the dataframes in
>     the list
>      >    mod<-lmer(dbp~age +factor( sex) + bmi +
>      > time+(1|id),data=list_df[[i]]) #mixed model
>      >    result1_df[i,]<-fixef(mod) #extract coefficients to dataframe
>      > rownames(result1_df)[i]<-names(list_df)[i] #assign rowname to
>     results
>      > from data used
>      > }
>      > result1_df
>      > mean(result1_df$intercept)
>      > mean(result1_df$age)
>      > mean(result1_df$sex)
>      > mean(result1_df$bmi)
>      > mean(result1_df$time)
>      >
>      >
> 
>     -- 
>     Dr. Benjamin Bolker
>     Professor, Mathematics & Statistics and Biology, McMaster University
>     Director, School of Computational Science and Engineering
>     Graduate chair, Mathematics & Statistics
> 

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics