[R] subset using noncontiguous variables by name (not index)

Muenchen, Robert A (Bob) muenchen at utk.edu
Mon Aug 27 15:39:03 CEST 2007


Gabor, That works great!

I think this would be a very helpful addition to the main R
distribution. Perhaps with a single colon representing numerical order
(exactly as you have written it) and two colons representing the order
of the variables as they appear in the data frame (your first example).
That's analogous to SAS' x1-xN, which you know gets those N variables,
and a--z, which selects an unknown number of variables a through z. How
many that is depends upon their order in the data frame. That would not
only be very useful in general, but it would also make transitioning to
R from SAS or SPSS less confusing.

Is R still being extended in such basic ways, or does that muck up
existing programs too much?

Thanks,
Bob

> -----Original Message-----
> From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Sent: Sunday, August 26, 2007 8:52 PM
> To: Muenchen, Robert A (Bob)
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] subset using noncontiguous variables by name (not
> index)
> 
> Try this:
> 
> > "%:%" <- function(x, y) {
> +    prex <- gsub("[0-9]", "", x); postx <- gsub("[^0-9]", "", x)
> +    prey <- gsub("[0-9]", "", y); posty <- gsub("[^0-9]", "", y)
> +    stopifnot(prex == prey)
> +    paste(prex, seq(from = as.numeric(postx), to =
> as.numeric(posty)), sep = "")
> + }
> > "x2" %:% "x4"
> [1] "x2" "x3" "x4"
> 
> 
> On 8/26/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:
> > Thanks Bert & Gabor for two very interesting solutions!
> >
> > It would be very handy in R if string1:stringN generated
> > "string1","string2"..."stringN" it would make selections like this
> much
> > more obvious. I know it's easy to with the colon operator and paste
> > function but that's quite a step up in complexity compared to SAS'
x1
> > x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that
beginners
> > face early in learning R.
> >
> > While on the subject of the colon operator, why doesn't
> anscombe[[1:4]]
> > select the x variables in list form as anscombe[,1:4] or
> anscombe[1:4]
> > do in data frame form?
> >
> > Thanks,
> >
> > Bob
> >
> > =========================================================
> > Bob Muenchen (pronounced Min'-chen), Manager
> > Statistical Consulting Center
> > U of TN Office of Information Technology
> > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > Voice: (865) 974-5230
> > FAX: (865) 974-4810
> > Email: muenchen at utk.edu
> > Web: http://oit.utk.edu/scc,
> > News: http://listserv.utk.edu/archives/statnews.html
> > =========================================================
> >
> >
> > > -----Original Message-----
> > > From: Bert Gunter [mailto:gunter.berton at gene.com]
> > > Sent: Sunday, August 26, 2007 6:50 PM
> > > To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
> > > Cc: r-help at stat.math.ethz.ch
> > > Subject: RE: [R] subset using noncontiguous variables by name (not
> > > index)
> > >
> > > The problem is that "x3:x5" does not mean what you think it means.
> The
> > > only
> > > reason it does the right thing in subset() is because a clever
> trick
> > is
> > > used
> > > there (read the code -- it's not hard to understand) to ensure
that
> it
> > > does.
> > > Gabor has essentially mimicked that trick in his solution.
> > >
> > > However, it is not necessary do this. You can construct the call
> > > directly as
> > > you tried to do. Using the anscombe example, here's how:
> > >
> > > chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in
> quotes
> > > do.call (subset, list( x = anscombe, select = parse(text =
chooz)))
> > >
> > > -- Bert Gunter
> > > Genentech Non-Clinical Statistics
> > > South San Francisco, CA
> > >
> > > "The business of the statistician is to catalyze the scientific
> > > learning
> > > process."  - George E. P. Box
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: r-help-bounces at stat.math.ethz.ch
> > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor
> > > > Grothendieck
> > > > Sent: Sunday, August 26, 2007 2:10 PM
> > > > To: Muenchen, Robert A (Bob)
> > > > Cc: r-help at stat.math.ethz.ch
> > > > Subject: Re: [R] subset using noncontiguous variables by name
> > > > (not index)
> > > >
> > > > Using builtin data frame anscombe try this. First we set up a
> > > > data frame
> > > > anscombe.seq which has one row containing 1, 2, 3, ... .  Then
> > select
> > > > out from that data frame and unlist it to get the desired
> > > > index vector.
> > > >
> > > > > anscombe.seq <- replace(anscombe[1,], TRUE,
> seq_along(anscombe))
> > > > > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > > > > anscombe[idx]
> > > >    x1 x3 x4   y2
> > > > 1  10 10  8 9.14
> > > > 2   8  8  8 8.14
> > > > 3  13 13  8 8.74
> > > > 4   9  9  8 8.77
> > > > 5  11 11  8 9.26
> > > > 6  14 14  8 8.10
> > > > 7   6  6  8 6.13
> > > > 8   4  4 19 3.10
> > > > 9  12 12  8 9.13
> > > > 10  7  7  8 7.26
> > > > 11  5  5  8 4.74
> > > >
> > > >
> > > > On 8/26/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:
> > > > > Hi All,
> > > > >
> > > > > I'm using the subset function to select a list of variables,
> some
> > > of
> > > > > which are contiguous in the data frame, and others of which
> > > > are not. It
> > > > > works fine when I use the form:
> > > > >
> > > > > subset(mydata,select=c(x1,x3:x5,x7) )
> > > > >
> > > > > In reality, my list is far more complex. So I would like to
> > > > store it in
> > > > > a variable to substitute in for c(x1,x3:x5,x7) but cannot get
> it
> > to
> > > > > work. That use of the c function seems to violate R rules,
> > > > so I'm not
> > > > > sure how it works at all. A small simulation of the problem
> > > > is below.
> > > > >
> > > > > If the variable names & orders were really this simple, I
could
> > use
> > > > > indices like
> > > > >
> > > > > summary( mydata[ ,c(1,3:5,7) ] )
> > > > >
> > > > > but alas, they are not.
> > > > >
> > > > > How does the c function work this way in the first place,
> > > > and how can I
> > > > > make this substitution?
> > > > >
> > > > > Thanks,
> > > > > Bob
> > > > >
> > > > > mydata <- data.frame(
> > > > >  x1=c(1,2,3,4,5),
> > > > >  x2=c(1,2,3,4,5),
> > > > >  x3=c(1,2,3,4,5),
> > > > >  x4=c(1,2,3,4,5),
> > > > >  x5=c(1,2,3,4,5),
> > > > >  x6=c(1,2,3,4,5),
> > > > >  x7=c(1,2,3,4,5)
> > > > > )
> > > > > mydata
> > > > >
> > > > > # This does what I want.
> > > > > summary(
> > > > >  subset(mydata,select=c(x1,x3:x5,x7) )
> > > > > )
> > > > >
> > > > > # Can I substitute myVars?
> > > > > attach(mydata)
> > > > > myVars1 <- c(x1,x3:x5,x7)
> > > > >
> > > > > # Not looking good!
> > > > > myVars1
> > > > >
> > > > > # This doesn't do the right thing.
> > > > > summary(
> > > > >  subset(mydata,select=myVars1 )
> > > > > )
> > > > >
> > > > > # Total desperation on this attempt:
> > > > > myVars2 <- "x1,x3:x5,x7"
> > > > > myVars2
> > > > >
> > > > > # This doesn't work either.
> > > > > summary(
> > > > >  subset(mydata,select=myVars2 )
> > > > > )
> > > > >
> > > > >
> > > > >
> > > > > =========================================================
> > > > > Bob Muenchen (pronounced Min'-chen), Manager
> > > > > Statistical Consulting Center
> > > > > U of TN Office of Information Technology
> > > > > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > > > > Voice: (865) 974-5230
> > > > > FAX: (865) 974-4810
> > > > > Email: muenchen at utk.edu
> > > > > Web: http://oit.utk.edu/scc,
> > > > > News: http://listserv.utk.edu/archives/statnews.html
> > > > >
> > > > > ______________________________________________
> > > > > R-help at stat.math.ethz.ch mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible
> code.
> > > > >
> > > >
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
> code.
> > > >
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >



More information about the R-help mailing list