[R] extracting characters from string

Soumendra soumendra at gmail.com
Fri Feb 11 09:26:33 CET 2011


Well, I believe, given the original statement of the problem, that it
is philosophically wrong to use the gsub approach. What if there are
50 underscores instead of 5, and you want to extract the characters
after the 23rd underscore? By using gsub, you are trying to fight
against the pattern of underscores. By using strsplit, we are using
that pattern to our advantage. Kind of. :)

Besides, breaking it up using strsplit will also give us the option to
iterate through it, though it is not relevant it here.




--
Soumendra Prasad Dhanee
Quantitative Analyst, Neural Technologies and Software Pvt. Ltd.

soumendra at neuraltechsoft.com, soumendra at maths.org.in, soumendra at gmail.com
+91-7498076111, +91-8100428686

--
"When you understand why you dismiss all the other possible gods, you
will understand why I dismiss yours." - Stephen Roberts



On 11 February 2011 05:55, jim holtman <jholtman at gmail.com> wrote:
> A safer way to make sure you don't match the underscore:
>
>> gsub("[^_]*_[^_]*_([^_]*).*", "\\1",  "abcd_efgh_XXXXX_12ab3_dfsfd")
> [1] "XXXXX"
>
>
> On Thu, Feb 10, 2011 at 2:06 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>> So, a way could be:
>>
>> gsub("(.*)_(.*)_(.*)_.*", "\\3",  "abcd_efgh_XXXXX_12ab3_dfsfd")
>>
>> On Thu, Feb 10, 2011 at 3:47 PM, Soumendra <soumendra at gmail.com> wrote:
>>
>>> Hi Henrique,
>>>
>>> I believe your solution is wrong as it is fitted to find 12ab3,
>>> whereas Yan seems to be asking for the characters after the second
>>> underscore and before the third underscore.
>>>
>>> For example, gsub(".*_.*_(.*)_.*", "\\1",
>>> "abcd_efgh_XXXXX_12ab3_dfsfd") would still yield 12ab3 even though, as
>>> I understand it, it should have output XXXXX.
>>>
>>> I think a straightforward solution would do the job:
>>>
>>> strsplit("abcd_efgh_12ab3_dfsfd", "_")[[1]][3]
>>>
>>> strsplit("abcd_efgh_XXXXX_12ab3_dfsfd", "_")[[1]][3] has the output
>>> XXXXX, for example.
>>>
>>> Of course, I would be wrong if Yan specifically wanted to find the
>>> string 12ab3. But in that case, he would have been asking for matching
>>> (and locating) that substring instead of extracting it.
>>>
>>> Regards,
>>>
>>> Soumendra
>>>
>>>
>>> --
>>> Soumendra Prasad Dhanee
>>> Quantitative Analyst, Neural Technologies and Software Pvt. Ltd.
>>>
>>> soumendra at neuraltechsoft.com, soumendra at maths.org.in, soumendra at gmail.com
>>> +91-7498076111, +91-8100428686
>>>
>>> --
>>> "When you understand why you dismiss all the other possible gods, you
>>> will understand why I dismiss yours." - Stephen Roberts
>>>
>>>
>>>
>>> On 10 February 2011 11:52, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>>> > Try this:
>>> >
>>> > gsub(".*_.*_(.*)_.*", "\\1", "abcd_efgh_12ab3_dfsfd")
>>> >
>>> > On Thu, Feb 10, 2011 at 9:42 AM, Yan Jiao <y.jiao at ucl.ac.uk> wrote:
>>> >
>>> >> Dear R gurus,
>>> >>
>>> >>
>>> >>
>>> >> If I got a vector with string characters like "abcd_efgh_12ab3_dfsfd",
>>> >> how could I extract "12ab3", which is the characters after second
>>> >> underscore and before the third underscore?
>>> >>
>>> >>
>>> >>
>>> >> Tons of thanks
>>> >>
>>> >>
>>> >>
>>> >> yan
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> **********************************************************************
>>> >> This email and any files transmitted with it are
>>> confide...{{dropped:10}}
>>> >>
>>> >> ______________________________________________
>>> >> R-help at r-project.org mailing list
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> >> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Henrique Dallazuanna
>>> > Curitiba-Paraná-Brasil
>>> > 25° 25' 40" S 49° 16' 22" O
>>> >
>>> >        [[alternative HTML version deleted]]
>>> >
>>> >
>>> > ______________________________________________
>>> > R-help at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>



More information about the R-help mailing list