[R] extracting characters from string

jim holtman jholtman at gmail.com
Fri Feb 11 01:25:25 CET 2011


A safer way to make sure you don't match the underscore:

> gsub("[^_]*_[^_]*_([^_]*).*", "\\1",  "abcd_efgh_XXXXX_12ab3_dfsfd")
[1] "XXXXX"


On Thu, Feb 10, 2011 at 2:06 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
> So, a way could be:
>
> gsub("(.*)_(.*)_(.*)_.*", "\\3",  "abcd_efgh_XXXXX_12ab3_dfsfd")
>
> On Thu, Feb 10, 2011 at 3:47 PM, Soumendra <soumendra at gmail.com> wrote:
>
>> Hi Henrique,
>>
>> I believe your solution is wrong as it is fitted to find 12ab3,
>> whereas Yan seems to be asking for the characters after the second
>> underscore and before the third underscore.
>>
>> For example, gsub(".*_.*_(.*)_.*", "\\1",
>> "abcd_efgh_XXXXX_12ab3_dfsfd") would still yield 12ab3 even though, as
>> I understand it, it should have output XXXXX.
>>
>> I think a straightforward solution would do the job:
>>
>> strsplit("abcd_efgh_12ab3_dfsfd", "_")[[1]][3]
>>
>> strsplit("abcd_efgh_XXXXX_12ab3_dfsfd", "_")[[1]][3] has the output
>> XXXXX, for example.
>>
>> Of course, I would be wrong if Yan specifically wanted to find the
>> string 12ab3. But in that case, he would have been asking for matching
>> (and locating) that substring instead of extracting it.
>>
>> Regards,
>>
>> Soumendra
>>
>>
>> --
>> Soumendra Prasad Dhanee
>> Quantitative Analyst, Neural Technologies and Software Pvt. Ltd.
>>
>> soumendra at neuraltechsoft.com, soumendra at maths.org.in, soumendra at gmail.com
>> +91-7498076111, +91-8100428686
>>
>> --
>> "When you understand why you dismiss all the other possible gods, you
>> will understand why I dismiss yours." - Stephen Roberts
>>
>>
>>
>> On 10 February 2011 11:52, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>> > Try this:
>> >
>> > gsub(".*_.*_(.*)_.*", "\\1", "abcd_efgh_12ab3_dfsfd")
>> >
>> > On Thu, Feb 10, 2011 at 9:42 AM, Yan Jiao <y.jiao at ucl.ac.uk> wrote:
>> >
>> >> Dear R gurus,
>> >>
>> >>
>> >>
>> >> If I got a vector with string characters like "abcd_efgh_12ab3_dfsfd",
>> >> how could I extract "12ab3", which is the characters after second
>> >> underscore and before the third underscore?
>> >>
>> >>
>> >>
>> >> Tons of thanks
>> >>
>> >>
>> >>
>> >> yan
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> **********************************************************************
>> >> This email and any files transmitted with it are
>> confide...{{dropped:10}}
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >
>> >
>> > --
>> > Henrique Dallazuanna
>> > Curitiba-Paraná-Brasil
>> > 25° 25' 40" S 49° 16' 22" O
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list