[R] misbehavior with extract_numeric() from tidyr

arnaud gaboury arnaud.gaboury at gmail.com
Mon Apr 20 12:28:58 CEST 2015


On Mon, Apr 20, 2015 at 12:09 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi arnaud,
> At a guess, it is the two hyphens that are present in those strings. I
> think that the function you are using interprets them as subtraction
> operators and since the string following the hyphen would produce NA,
> the result would be NA.

I was thinking of 'x' as being the culprit (interpreted as multiply)
but you are right indeed

noHyphens <- str_replace(playerStats[c(22,24)],'-','')
 extract_numeric(noHyphens)
[1]   276 83226


in fact:
---------------------------------------------------------
 extract_numeric
function (x)
{
    as.numeric(gsub("[^0-9.-]+", "", as.character(x)))
}
<environment: namespace:tidyr>
---------------------------------------------------------

Is there any particular reason for the hyphen in gsub() ? Why not
remove it thus ?

TY much Jim

>
> Jim
>
>
> On Mon, Apr 20, 2015 at 7:46 PM, arnaud gaboury
> <arnaud.gaboury at gmail.com> wrote:
>> On Mon, Apr 20, 2015 at 9:10 AM, arnaud gaboury
>> <arnaud.gaboury at gmail.com> wrote:
>>> R 3.2.0 on Linux
>>> --------------------------------
>>>
>>> library(tidyr)
>>>
>>> playerStats <- c("LVL 10", "5,671,448 AP l6,000,000 AP", "Unique
>>> Portals Visited 1,038",
>>> "XM Collected 15,327,123 XM", "Hacks 14,268", "Resonators Deployed 11,126",
>>> "Links Created 1,744", "Control Fields Created 294", "Mind Units
>>> Captured 2,995,484 MUs",
>>> "Longest Link Ever Created 75 km", "Largest Control Field 189,731 MUs",
>>> "XM Recharged 3,006,364 XM", "Portals Captured 1,204", "Unique Portals
>>> Captured 486",
>>> "Resonators Destroyed 12,481", "Portals Neutralized 1,240", "Enemy
>>> Links Destroyed 3,169",
>>> "Enemy Control Fields Destroyed 1,394", "Distance Walked 230 km",
>>> "Max Time Portal Held 240 days", "Max Time Link Maintained 15 days",
>>> "Max Link Length x Days 276 km-days", "Max Time Field Held 4days",
>>> "Largest Field MUs x Days 83,226 MU-days")
>>>
>>> -----------------------------------------------------------------------------------------------
>>>  extract_numeric(playerStats)
>>>  [1]             10 56714486000000           1038       15327123
>>>    14268          11126           1744            294        2995484
>>> [10]             75         189731        3006364           1204
>>>      486          12481           1240           3169           1394
>>> [19]            230            240             15             NA
>>>        4             NA
>>>
>>> ------------------------------------------------------------------------------------------------
>>>  playerStats[c(22,24)]
>>> [1] "Max Link Length x Days 276 km-days"      "Largest Field MUs x
>>> Days 83,226 MU-days"
>>> --------------------------------------------------------------------------------------------
>>>
>>> I do not understand why these two vectors return NA when the function
>>> extract_numeric() works well for others,
>>>
>>> Any wrong settings in my env?
>>
>> -------------------------------------------------------------------------
>>  as.numeric(gsub("[^0-9]", "",playerStats))
>>  [1]             10 56714486000000           1038       15327123
>>    14268          11126           1744            294        2995484
>> [10]             75         189731        3006364           1204
>>      486          12481           1240           3169           1394
>> [19]            230            240             15            276
>>        4          83226
>> --------------------------------------------------------------------
>>
>> The above command does the job, but I still can not figure out why
>> extract_numeric() returns two NA
>>
>>>
>>> Thank you for hints.
>>>
>>>
>>>
>>> --
>>>
>>> google.com/+arnaudgabourygabx
>>
>>
>>
>> --
>>
>> google.com/+arnaudgabourygabx
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



-- 

google.com/+arnaudgabourygabx



More information about the R-help mailing list