[R] ifelse problem - bug or operator error
R. Michael Weylandt
michael.weylandt at gmail.com
Sat Aug 25 04:31:44 CEST 2012
On Fri, Aug 24, 2012 at 7:29 PM, Jennifer Sabatier
<plessthanpointohfive at gmail.com> wrote:
> AAAAAHHHHHHH I GOT IT!!!!!!!!!!
>
> And I *think* I understand about floating point arithmetic..
Well then you're doing much better than the rest of us: it's quite a
difficult subject and only gets trickier as you think about it more.
(Numerical analysis generally, not the definition of an IEEE754 / ISO
6059 double) You even get such fun as
-1 * 0 != 1 * 0.
under some interpretations.
>
> In this case vn$PM.DIST.TOT is the sum of proportions. So, it should
> be anywhere 0 and 1.
>
> In our case, if it's anything other than 1 when vn$PM.EXP is greater
> than 0 then it means something is wrong with one of the variables
> used to sum vn$PM.DIST.TOT.
>
> I was worried making it an integer will cause cases of 0.4 to be 0 and
> look legal, when it's not (though it doesn't actually seem to be a
> problem).
>
> So, I just did what Michael and Peter suggested, after reading up on
> floating points.
>
> fpf <- 1e-05 # fpf = floating point fuzz
Though I sugested 1e-05 here, usually one uses slightly more stringent
testing: a general rule of thumb is the square root of machine
precision. In R terms,
sqrt(.Machine$double.eps)
>
> vn$PM.DIST_flag<-ifelse(vn$PM.EXP > 0 & abs(vn$PM.DIST.TOT - 1) > fpf , 1, 0)
>
> YAAAAAYYYYY!!!!
>
> Thanks, solved AND I learned something new.
>
> Thanks, alll, and have a GREAT weekend!
>
> Jen
Just for the "macro-take-away": this is the reason we don't really
like console printout instead of dput() to show a problem: if you dput
the original not-yet-ifelse-d numbers, you'll see that they really
aren't 1's, but that they are truncated upon regular printing.
Cheers and don't forget the old adage: 0.1*10 is hardly ever 1,
Michael
>
>
> On Fri, Aug 24, 2012 at 6:27 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
>> I see that you got other responses while I was composing an answer.
>> Your 'example.csv' did come through for me, but I still can't
>> replicate your PM.DIST_flag variable. Specifically, observations
>> 30, 33, 36 and 40 are wrong.
>>
>> I agree with Rui, that there's something else going on. The data
>> you've sent can't be the data that yielded the 'flag' variable
>> or you didn't use the ifelse() function in the way that you've
>> shown.
>>
>> I would start with a clean R session and I would use the 'convert
>> logical to numeric' idea (or keep a logical rather than numeric
>> flag):
>>
>> vn <- transform(vn,
>> my_flag = ( (PM.EXP > 0) & (PM.DIST.TOT != 1) ) * 1 )
>>
>> It looks as though your PM.DIST.TOT variable is meant to be
>> integer. If so, you might want to ensure that it is that type.
>> Otherwise, you might want to use Michael's suggestion of using
>> abs(... - 1) < 1e-05.
>>
>> Peter Ehlers
>>
>>
>> On 2012-08-24 14:56, Jennifer Sabatier wrote:
>>>
>>> Hi Michael,
>>>
>>> Thanks for letting me know how to post data. I will try to upload it
>>> that way in a second.
>>>
>>> I can usually use code to make a reproducible dataset but this time
>>> with the ifelse behaving strangely (perhaps, it's probably me) I
>>> didn't think I could do it easily so I figured I would just put my
>>> data up.
>>>
>>> I will check out the R FAQ you mentioned.
>>>
>>> Thanks, again,
>>>
>>> Jen
>>>
>>>
>>>
>>> On Fri, Aug 24, 2012 at 5:50 PM, R. Michael Weylandt
>>> <michael.weylandt at gmail.com> wrote:
>>>>
>>>> On Fri, Aug 24, 2012 at 4:46 PM, Jennifer Sabatier
>>>> <plessthanpointohfive at gmail.com> wrote:
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> No, I never use attach(), exactly for the reasons you state. To do
>>>>> due diligence I did a search of code for the function and it didn't
>>>>> come up (I would have been shocked because I never us it!).
>>>>>
>>>>> Now that real data is up, does your suggestion still apply? I am
>>>>> reading it now.
>>>>>
>>>>
>>>> If you mean the data you sent to Peter, it got scrubbed by the list
>>>> servers as well (they are somewhat draconian, but appropriately so in
>>>> the long run). The absolute best way to send R data via email (esp on
>>>> this list) is to use the dput() function which will create a plain
>>>> text representation of your data _exactly_ as R sees it. It's a little
>>>> hard for the untrained eye to parse (I can usually get about 90% of
>>>> what it all means but there's some stuff with rownames = NA I've never
>>>> looked into) but it's perfectly reproducible to a different R session.
>>>> Then us having the same data is a simple copy+paste away.
>>>>
>>>> For more on dput() and reproducibility generally, see
>>>>
>>>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>>>>
>>>> It could be the floating point thing (it's hard to say without knowing
>>>> how your data was calculated), but Rui seems to think not.
>>>>
>>>> M
>>>>
>>>>> Thanks,
>>>>>
>>>>> Jen
>>>>>
>>>>> On Fri, Aug 24, 2012 at 5:38 PM, R. Michael Weylandt
>>>>> <michael.weylandt at gmail.com> wrote:
>>>>>>
>>>>>> Off the wall / wild guess, do you use attach() frequently? Not
>>>>>> entirely sure how it would come up, but it tends to make weird errors
>>>>>> like this occur.
>>>>>>
>>>>>> M
>>>>>>
>>>>>> On Fri, Aug 24, 2012 at 4:36 PM, Jennifer Sabatier
>>>>>> <plessthanpointohfive at gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Rui,
>>>>>>>
>>>>>>> Thanks so much for responding but I think with my HTML problem the vn
>>>>>>> data you made must not be the same. I tried running your code on the
>>>>>>> data (I uploaded a copy) and I got the same thing I had before.
>>>>>>>
>>>>>>> Jen
>>>>>>>
>>>>>>> On Fri, Aug 24, 2012 at 5:28 PM, Rui Barradas <ruipbarradas at sapo.pt>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> 165114 1 0 0 0 0 417313 1 0 3546 1 0 4613 1 0 225460 1 0 6417 1
>>>>>>>> 1 23
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
More information about the R-help
mailing list