[R] Bivariate - multivariate linear regression

Peter Ehlers ehlers at ucalgary.ca
Sat May 18 00:16:22 CEST 2013


On 2013-05-17 12:45, Jesse Gervais wrote:
> Hi there,
>
>
>
> I want to do several bivariate linear regressions and, than, do a
> multivariate linear regression including only variables significantly
> associated *(p < 0.15)* with y in bivariate analysis, without having to
> look manually to those p values.
>
>
>
> So, here what I got for the moment.
>
>
>
> First, I use this data set:
>
>
>
> tolerance <- read.csv("
> http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1.txt").
>
>
>
> Second, I used this command, allowing me to extract p-values later:
>
>
>
> lmp <- function (modelobject) {
>
>              if (class(modelobject) != "lm") stop("Not an object of class
> 'lm' ")
>
>              f <- summary(modelobject)$fstatistic
>
>              p <- pf(f[1],f[2],f[3],lower.tail=F)
>
>              attributes(p) <- NULL
>
>              return(p)}
>
>
>
> Third, I did my bivariate linear regressions:
>
>
>
> fit   = lm(exposure~tol11, data = tolerance)
>
> fit_2 = lm(exposure~tol12, data= tolerance)
>
> fit_3 = lm(exposure~tol13, data= tolerance)
>
> fit_4 = lm(exposure~tol14, data= tolerance)
>
> fit_5 = lm(exposure~tol15, data= tolerance)
>
>
>
> Fourth, I extracted p-values:
>
>
>
> lmp(fit)
>
> lmp(fit_2)
>
> lmp(fit_3)
>
> lmp(fit_4)
>
> lmp(fit_5)
>
>
>
> Firth, I confirmed that p-values were OK (just to be sure, it's the first
> time I used the above procedure) :
>
>
>
> summary (fit)
>
> summary (fit_2)
>
> summary (fit_3)
>
> summary (fit_4)
>
> summary (fit_5)
>
>
>
> And now, I’m, I don’t know what to do.
>
>
>
> The multivariate linear regression (if all variables were included) is:
>
>
>
> fit_multi = lm (exposure ~ tol11 + tol12 + tol13 + tol14 + tol15, data=
> tolerance)
>
>
>
> I would like to be able to do something like:
>
>
> fit_multi = lm (exposure ~ tol11 [include only if  lmp( fit) < 0.15] +
> tol12 [include only if  lmp(fit_2) < 0.15]  + tol13 [include only if
> lmp(fit_3) < 0.15] + tol14 [include only if lmp(fit_4) < 0.15]  +
> tol15 [include
> only if lmp(fit_4) < 0.15], data= tolerance)
>
>
>
> Any idea?
>

(Thanks for providing reproducible code!)

It seems to me that you're just missing two things:

1. a way to determine the names of the variables to be included
    in the multiple (not 'multivariate' to be nitpicky) regression;

2. a way to build the formula for the multiple regression once
    you know which predictors to include.

To get the variables:

   varnames <- names(tolerance)[2:6]
   pvec <- c(lmp(fit), lmp(fit_2), lmp(fit_3), lmp(fit_4), lmp(fit_5))
   use <- varnames[pvec < 0.15]
   use
   #[1] "tol14" "tol15"

To construct the formula:

   rhs <- paste(use, collapse = " + ")
   form <- paste("exposure ~", rhs)

And then use it:

   fit_multi <- lm(formula = form, data = tolerance)

Peter Ehlers



More information about the R-help mailing list