[R] Error: vector memory exhausted (limit reached?)

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Thu Nov 25 22:57:43 CET 2021


Note the following  alternative to formula paste gymnastics:
## regr_names as in Rui's post
allnames <- c("B5", regr_names)
linmod4 <- lm(B5 ~. , data = OrigData[allnames], na.action = na.exclude)
## or even
linmod4 <- lm(OrigData[allnames], na.action = na.exclude)

see ?formula for details, especially the end of the "Details" section.
But the following should give you some idea of why this works:

> d <- data.frame(a = 1:3, b= runif(3), c = runif(3), d=runif(3))
> formula(d)
a ~ b + c + d
> formula(d[c("b","a","c")])
b ~ a + c

I assume this should also work for your calc.relimp invocations also
(without explicitly invoking formula(...)) , but you'll have to check
of course.
Note also that there may be some negatives to this approach e.g.
perhaps because of greater memory requirements. Others more
knowledgeable (or more ambitious and willing to check) than I will
have to comment if so.


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Nov 25, 2021 at 8:09 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:
>
> Hello,
>
> Here is your code, simplified and maybe corrected.
> Some previous notes:
>
> 1. You don't need to attach()
>
> 2. You don't need to create categorydata<- as.data.frame(OrigData),
> read.csv already outputs a data.frame. In the code below, I use Origdata.
>
> 3. The file has many missing values, so I have cleaned the data. The
> code lines with the new pipe operator (R 4.1.0) give the dim of OrigData
> after cleansing and create the data.frame CleanData, with all complete
> rows, no missing values. There are a total of 968 rows with missing
> values so CleanData only has 840 rows.
>
> 4. Instead of having a very long regression formula, I use grep to get
> the column names starting with "B6_" and then a series of paste
> assembles the regression formula.
>
>
>
> library(relaimpo)
>
> # Read the data in. I've downloaded it to this directory
> path <- "~/tmp"
> filename <- "BB HTTF stacked B1, B3, B5, B6, C9 + FILTERED.csv"
> filename <- file.path(path, filename)
> OrigData <- read.csv(filename)
>
> ###### BRAND PERFORMANCE ADVOCACY (B5) ####
>
> # Runs a standard linear regression.
>
> resp <- "B5"
> regr_names <- grep("B6_", names(OrigData), value = TRUE)
> regr <- paste(regr_names, collapse = " + ")
> fmla <- as.formula(paste(resp, regr, sep = " ~ "))
>
> dim(OrigData)
> #[1] 1808   63
> OrigData[c(resp, regr_names)] |> na.omit() |> dim()
> #[1] 840  28
>
> CleanData <- OrigData[c(resp, regr_names)] |> na.omit()
>
> linmod4 <- lm(fmla, data = OrigData, na.action = na.exclude)
> linmod4_b <- lm(fmla, data = CleanData)
>
> all(coef(linmod4) == coef(linmod4_b))
> #[1] TRUE
>
> # Runs Shapley Value Regression with all the
> # coefficients set to sum a hundred
>
> # Any of these first two equivalent forms
> # take a very long time
> f4_lmg <- calc.relimp(
>    linmod4_b,
>    type = "lmg",
>    rela = TRUE
> )
> f4_lmg_b <- calc.relimp(
>    formula = fmla,
>    type = "lmg",
>    rela = TRUE,
>    data = CleanData
> )
>
> # These equivalent forms are quickly done
> f4_firstlast <- calc.relimp(
>    linmod4_b,
>    type = c("first","last"),
>    rela = TRUE
> )
> f4_firstlast_b <- calc.relimp(
>    formula = fmla,
>    type = c("first","last"),
>    rela = TRUE,
>    data = CleanData
> )
>
> Coefficient4 <- f4_lmg$lmg
> Rsq4 <- f4_firstlast$R2
> Proportion4 <- Coefficient4 * Rsq4
>
>
>
> Hope this helps,
>
> Rui Barradas
>
>
>
> Às 13:36 de 24/11/21, Olivia Keefer escreveu:
> > Apologies. First time posting and new to R so a lot of learning. Hope my attachments are helpful now.
> >
> >
> > #This loads the required package. Always select UK Bristol as a CRAN MIRROR / LOCATION
> > #1. Highlight the below code and run (3rd icon or right click)
> >
> > require(relaimpo)
> > install.packages('relaimpo',dep=TRUE)
> > install.packages('iterators')
> > install.packages('foreach')
> >
> > #2. Change directory to where all files are and change back slash to forward slash
> >
> > setwd('/Users/okeefer/Documents/Regressions')
> >
> >
> >
> > #Once final SPSS file is created save it as a CSV in same location.
> > #4. Below code opens the original file with data at respondent level. Update the name of data file only.
> >
> > OrigData = read.csv("BB HTTF stacked B1, B3, B5, B6, C9 + FILTERED.csv")
> >
> > #5. <Run below code to bottom do not need to change anything
> > #Shows all the variables names of the dataset
> >
> > names(OrigData)
> >
> > #Shows all the dimensions and class of the dataset
> >
> > dim(OrigData)
> > class(OrigData)
> >
> > #Creates a data frame
> >
> > categorydata<- as.data.frame(OrigData)
> >
> > #Shows all the variables & class from the newly created variable
> >
> > names(categorydata)
> > class(categorydata)
> >
> > #Makes the object accessible
> >
> > attach(categorydata)
> >
> > #>
> >
> >
> > ######BRAND PERFORMANCE ADVOCACY (B5)####
> >
> > #Runs a standard linear regression. Update predictor and outcome variables. The variable names only.
> > #First variable is Dependent variable. Ensure a + sign is between all independent variables. Ensure there is a space before and after +
> >
> > linmod4 <- lm( B5 ~ B6_1 + B6_2 + B6_3 + B6_4 + B6_5 + B6_6 + B6_7 + B6_8 + B6_9 + B6_10 + B6_11 + B6_12 + B6_13 + B6_14 + B6_15 + B6_16 + B6_17 + B6_18 + B6_19 + B6_20 + B6_21 + B6_22 + B6_23 + B6_24 + B6_25 + B6_26 + B6_27 , data= categorydata)
> >
> > #Runs Shapley Value Regression with all the coefficients set to sum a hundred
> >
> > f4 <-calc.relimp(linmod4,type = c("lmg","first","last"), rela = TRUE)
> >
> > Coefficient4 = f4$lmg
> > Rsq4 = f4$R2
> > Proportion4 = Coefficient4 * Rsq4
> >
> > ### Display CoEfficients
> >
> > Coefficient4
> >
> > ### Display R-Sq
> >
> > Rsq4
> >
> > ### Proportion of Model Explained by variable. Change file name to reflect project
> > Proportion4
> >
> > Results4 = cbind(Coefficient4,Proportion4,Rsq4)
> > write.csv(Results4, file = "bb_HTTF_performance_advocacy.csv")
> >
> >
> > Olivia Keefer
> > Insights Analyst
> > she/her/hers
> >
> > Monigle
> > 575 8th Avenue
> > Suite 1716
> > New York, NY 10018
> > M 740.701.2163
> > okeefer using monigle.com
> > www.monigle.com <http://www.monigle.com/>
> > linkedin <http://www.linkedin.com/company/monigle/>| twitter <https://twitter.com/Monigle>| monigle blog <http://www.monigle.com/blog/>
> >
> >
> >
> >
> > On 11/24/21, 2:59 AM, "Rui Barradas" <ruipbarradas using sapo.pt> wrote:
> >
> >      Hello,
> >
> >      You ask a question on a certain regression that exhausts vector memory
> >      but don't post the regression(s) code (and data, btw).
> >      And load package relaimpo before installing it.
> >
> >      Can you please read the posting guide linked to at the bottom of this
> >      and every R-Help mail? As is there's nothing to answer to.
> >
> >      Hope this helps,
> >
> >      Rui Barradas
> >
> >
> >      Às 20:26 de 23/11/21, Olivia Keefer escreveu:
> >      > Hello!
> >      >
> >      > My colleague and I have continually run into this error and would like to better understand why and how to remedy. We have been able to run other regressions, but when we go to run a certain set of variables, we both are getting this message each time we try.
> >      >
> >      > Any insight would be helpful as to 1) why and 2) how to remedy.
> >      >
> >      > Attaching our package as well here if it is helpful:
> >      >
> >      >
> >      >
> >      > #This loads the required package. Always select UK Bristol as a CRAN MIRROR / LOCATION
> >      >
> >      > #1. Highlight the below code and run (3rd icon or right click)
> >      >
> >      >
> >      >
> >      > require(relaimpo)
> >      >
> >      > install.packages('relaimpo',dep=TRUE)
> >      >
> >      > install.packages('iterators')
> >      >
> >      > install.packages('foreach')
> >      >
> >      >
> >      > Thanks!
> >      >
> >      > Olivia Keefer
> >      > Insights Analyst
> >      > she/her/hers
> >      >
> >      > Monigle
> >      > 575 8th Avenue
> >      > Suite 1716
> >      > New York, NY 10018
> >      > M 740.701.2163
> >      > okeefer using monigle.com<mailto:okeefer using monigle.com>
> >      > www.monigle.com<http://www.monigle.com/>
> >      > linkedin<http://www.linkedin.com/company/monigle/>| twitter<https://twitter.com/Monigle>| monigle blog<http://www.monigle.com/blog/>
> >      >
> >      >
> >      >
> >      >       [[alternative HTML version deleted]]
> >      >
> >      > ______________________________________________
> >      > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >      > https://stat.ethz.ch/mailman/listinfo/r-help
> >      > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >      > and provide commented, minimal, self-contained, reproducible code.
> >      >
> >      NOTICE: This email originated outside of Monigle Associates. Please be diligent about verifying the sender and validity of this message. Report this message if necessary.
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list