[R] Trying to replicate error message in subset()

Mon Feb 12 16:42:21 CET 2007

Okay

First- I apologise for my poor use of terminology (error vs. warning).

Second- thanks for pointing out what I hadn't noticed before- when I pass 
one case for selection to subset, I get all observations. When I pass two 
cases (as a vector), I get every second case for both cases (if both are 
present, if not, I just get every second case for the one that is present). 
Same happens for three cases, as pointed out by Petr below.

So, trying the %in% operator, I get slightly different behabviour, but the 
selection still seems dependent on the length of the vector given as a 
selector:

 > b<-c("D", "F", "A")
 > new2.dat<-subset(ex.dat, a%in%x1)
 > new2.dat
              y1 x1 x2
1    2.34870479  A  B
3    1.66090055  A  B
5   -0.07904798  A  B
7    2.07053656  A  B
9    2.97980444  A  B
.....

Now, I just get every second observation, over all cases of x1. Probably 
doesn't get as far as A because F is not present?

According to the documentation on subset(), the function works on rows of 
dataframes. I'm guessing this explains the behaviour I'm seeing- somehow, 
the length of the vector being passed as the subset argument is dictating 
which rows to evaluate.  So, can anyone offer advice on how to select EVERY 
instance for multiple cases in a dataframe (i.e., all cases of both A and D 
from ex.dat), or will subset always be tied to the length of the 'subset' 
argument when a vector is passed to it?

Cheers,

Mike

At 02:46 AM 12/02/2007, Petr Pikal wrote:
>Hi
>
>it is not error it is just warning (Beeping a tea kettle with boiling
>water is also not an error :-)
>and it tells you pretty explicitly what is wrong
>see length of your objects
>
> > a<-c("D", "F", "A")
> > new.dat<-subset(ex.dat, x1 == a)
>Warning messages:
>1: longer object length
>         is not a multiple of shorter object length in: is.na(e1) |
>is.na(e2)
>2: longer object length
>         is not a multiple of shorter object length in:
>`==.default`(x1, a)
> > new.dat
>             y1 x1 x2
>3    0.5977786  A  B
>6    2.5470739  A  B
>9    0.9128595  A  B
>12   1.0953531  A  D
>15   2.4984470  A  D
>18   1.7289529  A  D
>61  -0.4848938  D  B
>6
>
>you can do better with %in% operator.
>
>HTH
>Petr
>
>
>
>On 12 Feb 2007 at 1:51, Michael Rennie wrote:
>
>Date sent:              Mon, 12 Feb 2007 01:51:54 -0500
>To:                     r-help at stat.math.ethz.ch
>From:                   Michael Rennie <mrennie at utm.utoronto.ca>
>Subject:                [R] Trying to replicate error message in subset()
>
> >
> > Hi, there
> >
> > I am trying to replicate an error message in subset() to see what it
> > is that I'm doing wrong with the datasets I am trying to work with.
> >
> > Essentially, I am trying to pass a string vector to subset() in order
> > to select a specific collection of cases (i.e., I have data for these
> > cases in one table, and want to select data from another table that
> > match up with the cases in the first table).
> >
> > The error I get is as follows:
> >
> > Warning messages:
> > 1: longer object length
> >          is not a multiple of shorter object length in: is.na(e1) |
> >          is.na(e2)
> > 2: longer object length
> >          is not a multiple of shorter object length in:
> >          `==.default`(LAKE, g)
> >
> > Here is an example case I've been working with (which works) that I've
> > been trying to "break"such that I can get this error message to figure
> > out what I am doing wrong in my case.
> >
> > y1<-rnorm(100, 2)
> > x1<-rep(1:5, each=20)
> > x2<-rep(1:2, each=10, times=10)
> >
> > ex.dat<-data.frame(cbind(y1,x1,x2))
> >
> >
> > ex.dat$x1<-factor(ex.dat$x1, labels=c("A", "B", "C", "D", "E"))
> > ex.dat$x2<-factor(ex.dat$x2, labels=c("B", "D"))
> >
> > a<-c("D", "F")
> > a
> >
> > new.dat<-subset(ex.dat, x1 == a)
> > new.dat
> >
> > I thought maybe I was getting errors because I had cases in my
> > selection vector ('a' in this case) that weren't in my ex.dat list,
> > but subset handles this fine and just gives me what it can find in the
> > larger list.
> >
> > Any thoughts on how I can replicate the error? As far as I can tell,
> > the only difference between the case where I am getting errors and the
> > example above is that the levels of x1 in my case are words (i.e.,
> > "Smelly", "Howdy"), but strings are strings, aren't they?
> >
> > Mike
> >
> >
> > Michael Rennie
> > Ph.D. Candidate, University of Toronto at Mississauga
> > 3359 Mississauga Rd. N.
> > Mississauga, ON  L5L 1C6
> > Ph: 905-828-5452  Fax: 905-828-3792
> > www.utm.utoronto.ca/~w3rennie
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
>
>Petr Pikal
>petr.pikal at precheza.cz

Michael Rennie
Ph.D. Candidate, University of Toronto at Mississauga
3359 Mississauga Rd. N.
Mississauga, ON  L5L 1C6
Ph: 905-828-5452  Fax: 905-828-3792
www.utm.utoronto.ca/~w3rennie