[R] Trying to replicate error message in subset()
Petr Pikal
petr.pikal at precheza.cz
Tue Feb 13 11:28:32 CET 2007
Hi
On 12 Feb 2007 at 10:42, Michael Rennie wrote:
Date sent: Mon, 12 Feb 2007 10:42:21 -0500
To: "Petr Pikal" <petr.pikal at precheza.cz>, r-help at stat.math.ethz.ch
From: Michael Rennie <mrennie at utm.utoronto.ca>
Subject: Re: [R] Trying to replicate error message in subset()
>
> Okay
>
> First- I apologise for my poor use of terminology (error vs. warning).
>
> Second- thanks for pointing out what I hadn't noticed before- when I
> pass one case for selection to subset, I get all observations. When I
> pass two cases (as a vector), I get every second case for both cases
> (if both are present, if not, I just get every second case for the one
> that is present). Same happens for three cases, as pointed out by Petr
> below.
>
> So, trying the %in% operator, I get slightly different behabviour, but
> the selection still seems dependent on the length of the vector given
> as a selector:
>
> > b<-c("D", "F", "A")
> > new2.dat<-subset(ex.dat, a%in%x1)
> > new2.dat
> y1 x1 x2
> 1 2.34870479 A B
> 3 1.66090055 A B
> 5 -0.07904798 A B
> 7 2.07053656 A B
> 9 2.97980444 A B
> .....
>
> Now, I just get every second observation, over all cases of x1.
> Probably doesn't get as far as A because F is not present?
Are you completely sure?
I get
> table(ex.dat$x1)
A B C D E
40 40 40 40 40
> table(new.dat$x1)
A B C D E
40 0 0 40 0
so all ceses for A and D with subset like this
a<-c("D", "F", "A")
new.dat<-subset(ex.dat, x1 %in% a)
and ex.dat constructed according to your example.
If I get something what I do not expect:
1. I check if my data are what they should be
2. I check if search path and working directory does not contain some
objects with conflicting names
3. If my functions are complicated I try to look how their parts
really work
If everything seems OK and unexpected behaviour still occures, I go
through docummentation, help archives and finally I try to seek an
advice from help list.
I must say that this is a bit time consuming but I usually learn a
lot from my mistakes which I am able to resolve myself.
HTH
Petr
>
> According to the documentation on subset(), the function works on rows
> of dataframes. I'm guessing this explains the behaviour I'm seeing-
> somehow, the length of the vector being passed as the subset argument
> is dictating which rows to evaluate. So, can anyone offer advice on
> how to select EVERY instance for multiple cases in a dataframe (i.e.,
> all cases of both A and D from ex.dat), or will subset always be tied
> to the length of the 'subset' argument when a vector is passed to it?
>
> Cheers,
>
> Mike
>
>
> At 02:46 AM 12/02/2007, Petr Pikal wrote:
> >Hi
> >
> >it is not error it is just warning (Beeping a tea kettle with boiling
> >water is also not an error :-) and it tells you pretty explicitly
> >what is wrong see length of your objects
> >
> > > a<-c("D", "F", "A")
> > > new.dat<-subset(ex.dat, x1 == a)
> >Warning messages:
> >1: longer object length
> > is not a multiple of shorter object length in: is.na(e1) |
> >is.na(e2)
> >2: longer object length
> > is not a multiple of shorter object length in:
> >`==.default`(x1, a)
> > > new.dat
> > y1 x1 x2
> >3 0.5977786 A B
> >6 2.5470739 A B
> >9 0.9128595 A B
> >12 1.0953531 A D
> >15 2.4984470 A D
> >18 1.7289529 A D
> >61 -0.4848938 D B
> >6
> >
> >you can do better with %in% operator.
> >
> >HTH
> >Petr
> >
> >
> >
> >On 12 Feb 2007 at 1:51, Michael Rennie wrote:
> >
> >Date sent: Mon, 12 Feb 2007 01:51:54 -0500
> >To: r-help at stat.math.ethz.ch
> >From: Michael Rennie <mrennie at utm.utoronto.ca>
> >Subject: [R] Trying to replicate error message in
> >subset()
> >
> > >
> > > Hi, there
> > >
> > > I am trying to replicate an error message in subset() to see what
> > > it is that I'm doing wrong with the datasets I am trying to work
> > > with.
> > >
> > > Essentially, I am trying to pass a string vector to subset() in
> > > order to select a specific collection of cases (i.e., I have data
> > > for these cases in one table, and want to select data from another
> > > table that match up with the cases in the first table).
> > >
> > > The error I get is as follows:
> > >
> > > Warning messages:
> > > 1: longer object length
> > > is not a multiple of shorter object length in: is.na(e1)
> > > | is.na(e2)
> > > 2: longer object length
> > > is not a multiple of shorter object length in:
> > > `==.default`(LAKE, g)
> > >
> > > Here is an example case I've been working with (which works) that
> > > I've been trying to "break"such that I can get this error message
> > > to figure out what I am doing wrong in my case.
> > >
> > > y1<-rnorm(100, 2)
> > > x1<-rep(1:5, each=20)
> > > x2<-rep(1:2, each=10, times=10)
> > >
> > > ex.dat<-data.frame(cbind(y1,x1,x2))
> > >
> > >
> > > ex.dat$x1<-factor(ex.dat$x1, labels=c("A", "B", "C", "D", "E"))
> > > ex.dat$x2<-factor(ex.dat$x2, labels=c("B", "D"))
> > >
> > > a<-c("D", "F")
> > > a
> > >
> > > new.dat<-subset(ex.dat, x1 == a)
> > > new.dat
> > >
> > > I thought maybe I was getting errors because I had cases in my
> > > selection vector ('a' in this case) that weren't in my ex.dat
> > > list, but subset handles this fine and just gives me what it can
> > > find in the larger list.
> > >
> > > Any thoughts on how I can replicate the error? As far as I can
> > > tell, the only difference between the case where I am getting
> > > errors and the example above is that the levels of x1 in my case
> > > are words (i.e., "Smelly", "Howdy"), but strings are strings,
> > > aren't they?
> > >
> > > Mike
> > >
> > >
> > > Michael Rennie
> > > Ph.D. Candidate, University of Toronto at Mississauga
> > > 3359 Mississauga Rd. N.
> > > Mississauga, ON L5L 1C6
> > > Ph: 905-828-5452 Fax: 905-828-3792
> > > www.utm.utoronto.ca/~w3rennie
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html and provide commented,
> > > minimal, self-contained, reproducible code.
> >
> >Petr Pikal
> >petr.pikal at precheza.cz
>
> Michael Rennie
> Ph.D. Candidate, University of Toronto at Mississauga
> 3359 Mississauga Rd. N.
> Mississauga, ON L5L 1C6
> Ph: 905-828-5452 Fax: 905-828-3792
> www.utm.utoronto.ca/~w3rennie
>
Petr Pikal
petr.pikal at precheza.cz
More information about the R-help
mailing list