[R] Is this a bug or am I making a mistake?

Walter Anderson wandrson01 at gmail.com
Mon Jan 6 20:16:31 CET 2014


On 01/06/2014 11:14 AM, Sarah Goslee wrote:
> Hi Walter,
>
> I can't reproduce your results. Please provide some data that
> demonstrates the problem.
>
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
> subset() and [ differ in their handling of NA values, and you don't
> need the dd$ in the arguments to subset().
>
> But those don't explain your result given the information provided.
> Please provide more information.
>
> Sarah
>
>
> On Mon, Jan 6, 2014 at 12:06 PM, Walter Anderson <wandrson01 at gmail.com> wrote:
>> I have a data frame that I am extracting some records from and noticed the
>> following issue
>>
>> I originally used tmp <- subset(dd, dd$EVYEAR==2012 & dd$EVMONTH=='02')
>>
>> and noticed that I wasn't ending up with all of the records I should have;
>> however, when I used
>>
>> tmp <- dd[dd$EVYEAR==2012 & dd$EVMONTH=='02',]
>>
>> I did get all of the records I should have.
>>
>> I thought the two forms were equivalent, am I mistaken?
>>
Thanks everyone for the response.  I didn't provide a reproducible test, 
since the data I experienced this issue with was   quite large (> 40MB) 
and I have not been able to reproduce the problem with any other data 
set.  I have also performed the subset using Microsoft Access on the 
original dbf file I use for the data frame and confirmed that the second 
query format (dd[QUERY,]) is producing the correct results.  It doesn't 
appear that any of the impacted (or any in the data frame) contain NA 
records.

I am not really looking for any particular solution, but was surprised 
by the different results from what I presumed to be the same query.  If 
it is believed to be a possible bug, I would be glad to package up the 
data that is generating the issue, but not sure where to place such a 
large data set.




More information about the R-help mailing list