[R] Is this a bug or am I making a mistake?

William Dunlap wdunlap at tibco.com
Mon Jan 6 20:38:02 CET 2014


You could compare the outputs of
    z1 <- with(dd, dd$EVYEAR==2012 & dd$EVMONTH=='02')
(which is like subset()) and that of
    z2 <- dd$EVYEAR==2012 & dd$EVMONTH=='02'
(evaluated from within the same context) with
     table(z1, z2, exclude=NULL)
That may show something useful.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Walter Anderson
> Sent: Monday, January 06, 2014 11:17 AM
> To: Sarah Goslee
> Cc: R Help
> Subject: Re: [R] Is this a bug or am I making a mistake?
> 
> On 01/06/2014 11:14 AM, Sarah Goslee wrote:
> > Hi Walter,
> >
> > I can't reproduce your results. Please provide some data that
> > demonstrates the problem.
> >
> > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-
> example
> >
> > subset() and [ differ in their handling of NA values, and you don't
> > need the dd$ in the arguments to subset().
> >
> > But those don't explain your result given the information provided.
> > Please provide more information.
> >
> > Sarah
> >
> >
> > On Mon, Jan 6, 2014 at 12:06 PM, Walter Anderson <wandrson01 at gmail.com> wrote:
> >> I have a data frame that I am extracting some records from and noticed the
> >> following issue
> >>
> >> I originally used tmp <- subset(dd, dd$EVYEAR==2012 & dd$EVMONTH=='02')
> >>
> >> and noticed that I wasn't ending up with all of the records I should have;
> >> however, when I used
> >>
> >> tmp <- dd[dd$EVYEAR==2012 & dd$EVMONTH=='02',]
> >>
> >> I did get all of the records I should have.
> >>
> >> I thought the two forms were equivalent, am I mistaken?
> >>
> Thanks everyone for the response.  I didn't provide a reproducible test,
> since the data I experienced this issue with was   quite large (> 40MB)
> and I have not been able to reproduce the problem with any other data
> set.  I have also performed the subset using Microsoft Access on the
> original dbf file I use for the data frame and confirmed that the second
> query format (dd[QUERY,]) is producing the correct results.  It doesn't
> appear that any of the impacted (or any in the data frame) contain NA
> records.
> 
> I am not really looking for any particular solution, but was surprised
> by the different results from what I presumed to be the same query.  If
> it is believed to be a possible bug, I would be glad to package up the
> data that is generating the issue, but not sure where to place such a
> large data set.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list