[R-sig-ME] lmer function
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Sun Aug 1 23:23:43 CEST 2021
Please reply to the list!
I think the warnings are an edge-case/false positive.
When reading a character vector R doesn't convert "NA" to a
not-available (NA) value (because people might have character vectors
denoting, say, North America (NA) or Nabisco (NA)). These provoke a
warning when converting. Try this function instead to convert:
as_num <- function(x) {
x <- as.character(x)
x[!is.na(x) & (x=="NA")] <- NA_character_
as.numeric(x)
}
On 8/1/21 5:00 PM, mina jahan wrote:
> Dear Ben,
> Thank you for your reply.
> age and bmi are quantitative variables.
> How can I define them as numeric variables?
> I used as.numeric, but I got warning message:NAs introduced by coercion
>
> cheers
> Mina
>
> On Mon, 2 Aug 2021 at 01:23, Ben Bolker <bbolker using gmail.com
> <mailto:bbolker using gmail.com>> wrote:
>
> Since you made your data available to me via google drive, I was
> able to figure out the problem; I wouldn't have been able to if you
> hadn't shared the data, although if you had presented the results of
> summary(B) or str(B) I (or someone) would probably have been able to
> diagnose the issue.
>
> The problem is that your 'bmi' variable is of type *character*;
> that
> means that when building the model matrix for the fixed-effect part of
> the model, we end up treating it as a categorical variable and
> trying to
> build a model matrix that is of size
>
> 8.0*nr*((n_bmi-1)+(n_age-1) + 1 + 2)/2^30
>
> in GB (n_age, n_bmi are the number of unique values of bmi and age);
> this comes out to about 30.5 Gb. On my machine it fails immediately by
> running out of memory; the proximal problem you are probably having is
> when the program tries to compute the rank of the matrix to check for
> muticollinearity. In any case, though, you probably *don't* want to
> fit
> a model with 42,000 fixed parameters ...
>
> If we address this problem by converting age and bmi to numeric, or
> by using readr::read_csv() to read in the file in the first place,
> everything works. (Note usual cautions about applying as.numeric()
> directly to a *factor*; in this case (with R>4.0) we are starting with
> type character, so it's OK.
>
>
> (to my surprise fread::data.table() also mis-categorizes these
> columns. I haven't been able to figure out why read.csv() and
> (especially) fread() get fooled ... the usual reasons (non-numeric
> entries, large numbers of NAs at the start of the column, etc.) don't
> seem to be present.
>
> cheers
> Ben Bolker
>
>
> On 7/31/21 6:09 PM, mina jahan wrote:
> >
> > I have a data set containing 20 imputed data. I want to use the
> lmer
> > function for computing regression coefficients for each
> imputation. But
> > I was exposed with under error:
> > Error in qr.default(X, tol = tol, LAPACK = FALSE) :
> > too large a matrix for LINPACK
> > I can not understand this error. I think that this issue is
> related to
> > the optimization algorithms used for inference.
> > R code is as follows:
> > library(lme4)
> >
> B<-read.csv("C:/Users/USER/Desktop/micemd2.csv",header=TRUE,na.string="")
> > names(B)
> > names(B)[names(B) == ".imp"] <- "imp"
> > B<-B[ , -2]
> > names(B)
> > B<-B[which(B$imp!=0),]
> > head(B)
> > tail(B)
> >
> > ###################split dataset by imp
> > list_df <- split(B, B$imp)
> >
> > ###################Coeficient for each imputation of lmer
> > result1_df <- as.data.frame(matrix(ncol=5,nrow=length(list_df)))
> # make
> > an empty dataframe
> > colnames(result1_df)<-c("intercept","age","sex","bmi","time")
> #give the
> > dataframe column names
> > for (i in 1:length(list_df)){ #run a loop over the dataframes in
> the list
> > mod<-lmer(dbp~age +factor( sex) + bmi +
> > time+(1|id),data=list_df[[i]]) #mixed model
> > result1_df[i,]<-fixef(mod) #extract coefficients to dataframe
> > rownames(result1_df)[i]<-names(list_df)[i] #assign rowname to
> results
> > from data used
> > }
> > result1_df
> > mean(result1_df$intercept)
> > mean(result1_df$age)
> > mean(result1_df$sex)
> > mean(result1_df$bmi)
> > mean(result1_df$time)
> >
> >
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> Graduate chair, Mathematics & Statistics
>
--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics
More information about the R-sig-mixed-models
mailing list