[R] Trying to replicate error message in subset()

Petr Pikal petr.pikal at precheza.cz
Tue Feb 13 11:28:32 CET 2007


Hi

On 12 Feb 2007 at 10:42, Michael Rennie wrote:

Date sent:      	Mon, 12 Feb 2007 10:42:21 -0500
To:             	"Petr Pikal" <petr.pikal at precheza.cz>, r-help at stat.math.ethz.ch
From:           	Michael Rennie <mrennie at utm.utoronto.ca>
Subject:        	Re: [R] Trying to replicate error message in subset()

> 
> Okay
> 
> First- I apologise for my poor use of terminology (error vs. warning).
> 
> Second- thanks for pointing out what I hadn't noticed before- when I
> pass one case for selection to subset, I get all observations. When I
> pass two cases (as a vector), I get every second case for both cases
> (if both are present, if not, I just get every second case for the one
> that is present). Same happens for three cases, as pointed out by Petr
> below.
> 
> So, trying the %in% operator, I get slightly different behabviour, but
> the selection still seems dependent on the length of the vector given
> as a selector:
> 
>  > b<-c("D", "F", "A")
>  > new2.dat<-subset(ex.dat, a%in%x1)
>  > new2.dat
>               y1 x1 x2
> 1    2.34870479  A  B
> 3    1.66090055  A  B
> 5   -0.07904798  A  B
> 7    2.07053656  A  B
> 9    2.97980444  A  B
> .....
> 
> Now, I just get every second observation, over all cases of x1.
> Probably doesn't get as far as A because F is not present?

Are you completely sure?

I get

> table(ex.dat$x1)

 A  B  C  D  E 
40 40 40 40 40 
> table(new.dat$x1)

 A  B  C  D  E 
40  0  0 40  0 

so all ceses for A and D with subset like this

a<-c("D", "F", "A")
new.dat<-subset(ex.dat, x1 %in% a)

and ex.dat constructed according to your example.

If I get something what I do not expect:

1.	I check if my data are what they should be
2.	I check if search path and working directory does not contain some 
objects with conflicting names
3.	If my functions are complicated I try to look how their parts 
really work

If everything seems OK and unexpected behaviour still occures, I go 
through docummentation, help archives and finally I try to seek an 
advice from help list.

I must say that this is a bit time consuming but I usually learn a 
lot from my mistakes which I am able to resolve myself.

HTH
Petr


> 
> According to the documentation on subset(), the function works on rows
> of dataframes. I'm guessing this explains the behaviour I'm seeing-
> somehow, the length of the vector being passed as the subset argument
> is dictating which rows to evaluate.  So, can anyone offer advice on
> how to select EVERY instance for multiple cases in a dataframe (i.e.,
> all cases of both A and D from ex.dat), or will subset always be tied
> to the length of the 'subset' argument when a vector is passed to it?
> 
> Cheers,
> 
> Mike
> 
> 
> At 02:46 AM 12/02/2007, Petr Pikal wrote:
> >Hi
> >
> >it is not error it is just warning (Beeping a tea kettle with boiling
> >water is also not an error :-) and it tells you pretty explicitly
> >what is wrong see length of your objects
> >
> > > a<-c("D", "F", "A")
> > > new.dat<-subset(ex.dat, x1 == a)
> >Warning messages:
> >1: longer object length
> >         is not a multiple of shorter object length in: is.na(e1) |
> >is.na(e2)
> >2: longer object length
> >         is not a multiple of shorter object length in:
> >`==.default`(x1, a)
> > > new.dat
> >             y1 x1 x2
> >3    0.5977786  A  B
> >6    2.5470739  A  B
> >9    0.9128595  A  B
> >12   1.0953531  A  D
> >15   2.4984470  A  D
> >18   1.7289529  A  D
> >61  -0.4848938  D  B
> >6
> >
> >you can do better with %in% operator.
> >
> >HTH
> >Petr
> >
> >
> >
> >On 12 Feb 2007 at 1:51, Michael Rennie wrote:
> >
> >Date sent:              Mon, 12 Feb 2007 01:51:54 -0500
> >To:                     r-help at stat.math.ethz.ch
> >From:                   Michael Rennie <mrennie at utm.utoronto.ca>
> >Subject:                [R] Trying to replicate error message in
> >subset()
> >
> > >
> > > Hi, there
> > >
> > > I am trying to replicate an error message in subset() to see what
> > > it is that I'm doing wrong with the datasets I am trying to work
> > > with.
> > >
> > > Essentially, I am trying to pass a string vector to subset() in
> > > order to select a specific collection of cases (i.e., I have data
> > > for these cases in one table, and want to select data from another
> > > table that match up with the cases in the first table).
> > >
> > > The error I get is as follows:
> > >
> > > Warning messages:
> > > 1: longer object length
> > >          is not a multiple of shorter object length in: is.na(e1)
> > >          | is.na(e2)
> > > 2: longer object length
> > >          is not a multiple of shorter object length in:
> > >          `==.default`(LAKE, g)
> > >
> > > Here is an example case I've been working with (which works) that
> > > I've been trying to "break"such that I can get this error message
> > > to figure out what I am doing wrong in my case.
> > >
> > > y1<-rnorm(100, 2)
> > > x1<-rep(1:5, each=20)
> > > x2<-rep(1:2, each=10, times=10)
> > >
> > > ex.dat<-data.frame(cbind(y1,x1,x2))
> > >
> > >
> > > ex.dat$x1<-factor(ex.dat$x1, labels=c("A", "B", "C", "D", "E"))
> > > ex.dat$x2<-factor(ex.dat$x2, labels=c("B", "D"))
> > >
> > > a<-c("D", "F")
> > > a
> > >
> > > new.dat<-subset(ex.dat, x1 == a)
> > > new.dat
> > >
> > > I thought maybe I was getting errors because I had cases in my
> > > selection vector ('a' in this case) that weren't in my ex.dat
> > > list, but subset handles this fine and just gives me what it can
> > > find in the larger list.
> > >
> > > Any thoughts on how I can replicate the error? As far as I can
> > > tell, the only difference between the case where I am getting
> > > errors and the example above is that the levels of x1 in my case
> > > are words (i.e., "Smelly", "Howdy"), but strings are strings,
> > > aren't they?
> > >
> > > Mike
> > >
> > >
> > > Michael Rennie
> > > Ph.D. Candidate, University of Toronto at Mississauga
> > > 3359 Mississauga Rd. N.
> > > Mississauga, ON  L5L 1C6
> > > Ph: 905-828-5452  Fax: 905-828-3792
> > > www.utm.utoronto.ca/~w3rennie
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html and provide commented,
> > > minimal, self-contained, reproducible code.
> >
> >Petr Pikal
> >petr.pikal at precheza.cz
> 
> Michael Rennie
> Ph.D. Candidate, University of Toronto at Mississauga
> 3359 Mississauga Rd. N.
> Mississauga, ON  L5L 1C6
> Ph: 905-828-5452  Fax: 905-828-3792
> www.utm.utoronto.ca/~w3rennie 
> 

Petr Pikal
petr.pikal at precheza.cz



More information about the R-help mailing list