[R] Defining partial list of variables
Bert Gunter
Tue Jan 5 17:15:52 CET 2021
I may not I properly understand the context of this discussion, and, in
particular what the my.formula() function does. But if I do, the following,
from ?formula, seems relevant and would indicate that the discussion is
unnecessary:
"There are two special interpretations of . in a formula. The usual one is
in the context of a data argument of model fitting functions and means ‘all
columns not otherwise in the formula’:"
This means you can fit different models just by indexing the columns -- by
number -- you wish to use in a data argument, viz:
y <- runif(100)
dat <- data.frame(matrix(runif(500), ncol = 5))
names(dat) <- letters[1:5]
head(dat)
## Use columns 1,3, and 5 only
mdl1 <- lm(y ~ ., data = dat[,c(1,3,5)])
## Result:
summary(mdl1)
Call:
lm(formula = y ~ ., data = dat[, c(1, 3, 5)])
Residuals:
Min 1Q Median 3Q Max
-0.52334 -0.27494 0.01245 0.28637 0.51998
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.51461 0.08236 6.248 1.14e-08 ***
a 0.01516 0.10928 0.139 0.890
c 0.03517 0.10399 0.338 0.736
e -0.09437 0.10967 -0.861 0.392
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.299 on 96 degrees of freedom
Multiple R-squared: 0.008256, Adjusted R-squared: -0.02274
F-statistic: 0.2664 on 3 and 96 DF, p-value: 0.8495
If I have misunderstood and this is unhelpful, just ignore without comment.
You don't need to waste time explaining it to me.
On Tue, Jan 5, 2021 at 4:49 AM Heinz Tuechler wrote:
> What about the Cs()-function in Hmisc?
> library(Hmisc)
> Cs(a,b,c)
> [1] "a" "b" "c"
>
> Steven Yen wrote/hat geschrieben on/am 05.01.2021 13:29:
> > Thanks Eric. Yes, "unlist" makes a difference. Below, I am doing not
> > regression but summary to keep the example simple.
> >
> > > set.seed(123)
> > > data<-matrix(runif(1:25),nrow=5)
> > > colnames(data)<-c("x1","x2","x3","x4","x5"); data
> > x1 x2 x3 x4 x5
> > [1,] 0.2875775 0.0455565 0.9568333 0.89982497 0.8895393
> > [2,] 0.7883051 0.5281055 0.4533342 0.24608773 0.6928034
> > [3,] 0.4089769 0.8924190 0.6775706 0.04205953 0.6405068
> > [4,] 0.8830174 0.5514350 0.5726334 0.32792072 0.9942698
> > [5,] 0.9404673 0.4566147 0.1029247 0.95450365 0.6557058
> > > j<-strsplit(gsub("[\n ]","","x1,x3,x5"),",")
> > > j<-unlist(j); j
> > [1] "x1" "x3" "x5"
> > > summary(data[,j])
> > x1 x3 x5
> > Min. :0.2876 Min. :0.1029 Min. :0.6405
> > 1st Qu.:0.4090 1st Qu.:0.4533 1st Qu.:0.6557
> > Median :0.7883 Median :0.5726 Median :0.6928
> > Mean :0.6617 Mean :0.5527 Mean :0.7746
> > 3rd Qu.:0.8830 3rd Qu.:0.6776 3rd Qu.:0.8895
> > Max. :0.9405 Max. :0.9568 Max. :0.9943
> >
On 2021/1/5 Eric Berger wrote:
> >> wrap it in unlist
> >>
> >> xx <- unlist(strsplit( .... ))
> >>
> >>
> >>
On Tue, Jan 5, 2021 at 12:59 PM Steven Yen wrote:
> >> <mailto:styen using ntu.edu.tw>> wrote:
> >>
> >> Thanks Eric. Perhaps I should know when to stop. The approach
> >> produces a slightly different variable list (note the [[1]]).
> >> Consequently, I was not able to use xx in defining my regression
> >> formula.
> >>
> >> > x<-colnames(subset(mydata,select=c(
> >>
> >> + hhsize,urban,male,
> >> + age3045,age4659,age60, # age1529
> >> + highsc,tert, # primary
> >> + gov,nongov, # unemp
> >> + married))); x
> >> [1] "hhsize" "urban" "male" "age3045" "age4659" "age60"
> >> "highsc" "tert"
> >> [9] "gov" "nongov" "married"
> >> > xx<-strsplit(gsub("[\n ]","",
> >> + "hhsize,urban,male,
> >> + age3045,age4659,age60,
> >> + highsc,tert,
> >> + gov,nongov,
> >> + married"
> >> + ),","); xx
> >> [[1]]
> >> [1] "hhsize" "urban" "male" "age3045" "age4659" "age60"
> >> "highsc" "tert"
> >> [9] "gov" "nongov" "married"
> >>
> >> > eq1<-my.formula(y="cig",x=x); eq1
> >> cig ~ hhsize + urban + male + age3045 + age4659 + age60 + highsc +
> >> tert + gov + nongov + married
> >> > eq2<-my.formula(y="cig",x=xx); eq2
> >> cig ~ c("hhsize", "urban", "male", "age3045", "age4659", "age60",
> >> "highsc", "tert", "gov", "nongov", "married")
> >>
On 2021/1/5 Eric Berger wrote:
> >>> If your column names have no spaces the following should work
> >>>
> >>> x<-strsplit(gsub("[\n ]","",
> >>> "hhsize,urban,male,
> >>> + gov,nongov,married"),","); x
> >>>
On Tue, Jan 5, 2021 at 11:47 AM Steven Yen wrote:
> >>> <mailto:styen using ntu.edu.tw>> wrote:
> >>>
> >>> Here we go! BUT, it works great for a continuous line. With
> >>> line break(s), I got the nuisance "\n" inserted.
> >>>
> >>> > x<-strsplit("hhsize,urban,male,gov,nongov,married",","); x
> >>> [[1]]
> >>> [1] "hhsize" "urban" "male" "gov" "nongov" "married"
> >>>
> >>> > x<-strsplit("hhsize,urban,male,
> >>> + gov,nongov,married",","); x
> >>> [[1]]
> >>> [1] "hhsize" "urban" "male"
> >>> "\n gov"
> >>> [5] "nongov" "married"
> >>>
On 2021/1/5 Eric Berger wrote:
> >>>>
> zx<-strsplit("age,exercise,income,white,black,hispanic,base,somcol,grad,employed,unable,homeowner,married,divorced,widowed",",")
> >>>>
> >>>>
> >>>>
On Tue, Jan 5, 2021 at 11:01 AM Steven Yen wrote:
> >>>> <mailto:styen using ntu.edu.tw>> wrote:
> >>>>
> >>>> Thank you, Jeff. IMO, we are all here to make R work
> >>>> better to suit our
> >>>> various needs. All I am asking is an easier way to
> >>>> define variable list
> >>>> zx, differently from the way z0 , x0, and treat are
> defined.
> >>>>
> >>>> > zx<-colnames(subset(mydata,select=c(
> >>>> +
> >>>>
> age,exercise,income,white,black,hispanic,base,somcol,grad,employed,
> >>>> + unable,homeowner,married,divorced,widowed)))
> >>>> > z0<-c("fruit","highblood")
> >>>> > x0<-c("vgood","poor")
> >>>> > treat<-"depression"
> >>>> > eq1 <-my.formula(y="depression",x=zx,z0)
> >>>> > eq2 <-my.formula(y="bmi", x=zx,x0)
> >>>> > eq2t<-my.formula(y="bmi", x=zx,treat)
> >>>> > eqs<-list(eq1,eq2); eqs
> >>>> [[1]]
> >>>> depression ~ age + exercise + income + white + black +
> >>>> hispanic +
> >>>> base + somcol + grad + employed + unable +
> >>>> homeowner + married +
> >>>> divorced + widowed + fruit + highblood
> >>>>
> >>>> [[2]]
> >>>> bmi ~ age + exercise + income + white + black + hispanic
> >>>> + base +
> >>>> somcol + grad + employed + unable + homeowner +
> >>>> married +
> >>>> divorced + widowed + vgood + poor
> >>>>
> >>>> > eqt<-list(eq1,eq2t); eqt
> >>>> [[1]]
> >>>> depression ~ age + exercise + income + white + black +
> >>>> hispanic +
> >>>> base + somcol + grad + employed + unable +
> >>>> homeowner + married +
> >>>> divorced + widowed + fruit + highblood
> >>>>
> >>>> [[2]]
> >>>> bmi ~ age + exercise + income + white + black + hispanic
> >>>> + base +
> >>>> somcol + grad + employed + unable + homeowner +
> >>>> married +
> >>>> divorced + widowed + depression
> >>>>
On 2021/1/5 Jeff Newmiller wrote:
> >>>> > IMO if you want to hardcode a formula then simply
> >>>> hardcode a formula. If you want 20 formulas, write 20
> >>>> formulas. Is that really so bad?
> >>>> >
> >>>> > If you want to have an abbreviated way to specify sets
> >>>> of variables without conforming to R syntax then put
> >>>> them into data files and read them in using a format of
> >>>> your choice.
> >>>> >
> >>>> > But using NSE to avoid using quotes for entering what
> >>>> amounts to in-script data is abuse of the language
> >>>> justified by laziness... the amount of work you put
> >>>> yourself and anyone else who reads your code through is
> >>>> excessive relative to the benefit gained.
> >>>> >
> >>>> > NSE has its strengths... but as a method of creating
> >>>> data objects it sucks. Note that even the tidyverse
> >>>> (now) requires you to use quotes when you are not
> >>>> directly referring to something that already exists. And
> >>>> if you were... you might as well be creating a formula.
> >>>> >
On January 4, 2021 11:14:54 PM PST, Steven Yen wrote:
> >>>> <styen using ntu.edu.tw <mailto:styen using ntu.edu.tw>> wrote:
> >>>> >> I constantly define variable lists from a data frame
> >>>> (e.g., to define a
> >>>> >>
> >>>> >> regression equation). Line 3 below does just that.
> >>>> Placing each
> >>>> >> variable
> >>>> >> name in quotation marks is too much work especially
> >>>> for a long list so
> >>>> >> I
> >>>> >> do that with line 4. Is there an easier way to
> >>>> accomplish this----to
> >>>> >> define a list of variable names containing
> >>>> "a","c","e"? Thank you!
> >>>> >>
> >>>> >>> data<-as.data.frame(matrix(1:30,nrow=6))
> >>>> >>> colnames(data)<-c("a","b","c","d","e"); data
> >>>> >> a b c d e
> >>>> >> 1 1 7 13 19 25
> >>>> >> 2 2 8 14 20 26
> >>>> >> 3 3 9 15 21 27
> >>>> >> 4 4 10 16 22 28
> >>>> >> 5 5 11 17 23 29
> >>>> >> 6 6 12 18 24 30
> >>>> >>> x1<-c("a","c","e"); x1 # line 3
> >>>> >> [1] "a" "c" "e"
> >>>> >>> x2<-colnames(subset(data,select=c(a,c,e))); x2 # line
> 4
> >>>> >> [1] "a" "c" "e"
> >>>> >>
> >>>>
> >>>>
> >
> >
> >
>
