[R] misbehavior with extract_numeric() from tidyr
arnaud gaboury
arnaud.gaboury at gmail.com
Mon Apr 20 12:28:58 CEST 2015
On Mon, Apr 20, 2015 at 12:09 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi arnaud,
> At a guess, it is the two hyphens that are present in those strings. I
> think that the function you are using interprets them as subtraction
> operators and since the string following the hyphen would produce NA,
> the result would be NA.
I was thinking of 'x' as being the culprit (interpreted as multiply)
but you are right indeed
noHyphens <- str_replace(playerStats[c(22,24)],'-','')
extract_numeric(noHyphens)
[1] 276 83226
in fact:
---------------------------------------------------------
extract_numeric
function (x)
{
as.numeric(gsub("[^0-9.-]+", "", as.character(x)))
}
<environment: namespace:tidyr>
---------------------------------------------------------
Is there any particular reason for the hyphen in gsub() ? Why not
remove it thus ?
TY much Jim
>
> Jim
>
>
> On Mon, Apr 20, 2015 at 7:46 PM, arnaud gaboury
> <arnaud.gaboury at gmail.com> wrote:
>> On Mon, Apr 20, 2015 at 9:10 AM, arnaud gaboury
>> <arnaud.gaboury at gmail.com> wrote:
>>> R 3.2.0 on Linux
>>> --------------------------------
>>>
>>> library(tidyr)
>>>
>>> playerStats <- c("LVL 10", "5,671,448 AP l6,000,000 AP", "Unique
>>> Portals Visited 1,038",
>>> "XM Collected 15,327,123 XM", "Hacks 14,268", "Resonators Deployed 11,126",
>>> "Links Created 1,744", "Control Fields Created 294", "Mind Units
>>> Captured 2,995,484 MUs",
>>> "Longest Link Ever Created 75 km", "Largest Control Field 189,731 MUs",
>>> "XM Recharged 3,006,364 XM", "Portals Captured 1,204", "Unique Portals
>>> Captured 486",
>>> "Resonators Destroyed 12,481", "Portals Neutralized 1,240", "Enemy
>>> Links Destroyed 3,169",
>>> "Enemy Control Fields Destroyed 1,394", "Distance Walked 230 km",
>>> "Max Time Portal Held 240 days", "Max Time Link Maintained 15 days",
>>> "Max Link Length x Days 276 km-days", "Max Time Field Held 4days",
>>> "Largest Field MUs x Days 83,226 MU-days")
>>>
>>> -----------------------------------------------------------------------------------------------
>>> extract_numeric(playerStats)
>>> [1] 10 56714486000000 1038 15327123
>>> 14268 11126 1744 294 2995484
>>> [10] 75 189731 3006364 1204
>>> 486 12481 1240 3169 1394
>>> [19] 230 240 15 NA
>>> 4 NA
>>>
>>> ------------------------------------------------------------------------------------------------
>>> playerStats[c(22,24)]
>>> [1] "Max Link Length x Days 276 km-days" "Largest Field MUs x
>>> Days 83,226 MU-days"
>>> --------------------------------------------------------------------------------------------
>>>
>>> I do not understand why these two vectors return NA when the function
>>> extract_numeric() works well for others,
>>>
>>> Any wrong settings in my env?
>>
>> -------------------------------------------------------------------------
>> as.numeric(gsub("[^0-9]", "",playerStats))
>> [1] 10 56714486000000 1038 15327123
>> 14268 11126 1744 294 2995484
>> [10] 75 189731 3006364 1204
>> 486 12481 1240 3169 1394
>> [19] 230 240 15 276
>> 4 83226
>> --------------------------------------------------------------------
>>
>> The above command does the job, but I still can not figure out why
>> extract_numeric() returns two NA
>>
>>>
>>> Thank you for hints.
>>>
>>>
>>>
>>> --
>>>
>>> google.com/+arnaudgabourygabx
>>
>>
>>
>> --
>>
>> google.com/+arnaudgabourygabx
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
--
google.com/+arnaudgabourygabx
More information about the R-help
mailing list