[R] string handling

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jun 4 21:03:00 CEST 2010


Here is a slightly simpler variant of the strapply solution:

> lapply(DF, strapply, "(.)/(.)", c, simplify = rbind)
$var1
     [,1] [,2]
[1,] "G"  "G"
[2,] "A"  "T"
[3,] "G"  "G"

$var2
     [,1] [,2]
[1,] "C"  "T"
[2,] "C"  "C"
[3,] "A"  "A"


On Fri, Jun 4, 2010 at 8:08 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> This solution using strapply in gsubfn is along the same lines as the
> stringr solution.  First we read in the data using as.is = TRUE so
> that we get character rather than factor columns.  On the other hand,
> if your data is already in columns with class factor then just replace
> strappy(x, ...) with strapply(as.character(x), ...) below.   Then
> lapply over the columns of DF using strapply on each one.    See home
> page at http://gsubfn.googlecode.com for more.
>
>> Lines <- "var1        var2
> + 9G/G09    abd89C/T90
> + 10A/T9    32C/C
> + 90G/G      A/A"
>>
>> library(gsubfn)
>> DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE)
>> lapply(DF, function(x) strapply(x, "(.)/(.)", c, simplify = rbind))
> $var1
>     [,1] [,2]
> [1,] "G"  "G"
> [2,] "A"  "T"
> [3,] "G"  "G"
>
> $var2
>     [,1] [,2]
> [1,] "C"  "T"
> [2,] "C"  "C"
> [3,] "A"  "A"
>
>
> Also a slight simplification is possible using gsubfn's capability of
> representing a one line function as a formula.  We just preface lapply
> with fn$ and then formulas appearing in the arguments (subject to
> certain rules) are interpreted as functions.  Here, the formula in the
> second argument to lapply is interpreted as the anonymous function we
> used above:
>
>> fn$lapply(DF, x ~ strapply(x, "(.)/(.)", c, simplify = rbind))
> $var1
>     [,1] [,2]
> [1,] "G"  "G"
> [2,] "A"  "T"
> [3,] "G"  "G"
>
> $var2
>     [,1] [,2]
> [1,] "C"  "T"
> [2,] "C"  "C"
> [3,] "A"  "A"
>
> On Thu, Jun 3, 2010 at 2:18 PM, karena <dr.jzhou at gmail.com> wrote:
>>
>> I have a data.frame as the following:
>> var1        var2
>> 9G/G09    abd89C/T90
>> 10A/T9    32C/C
>> 90G/G      A/A
>> .             .
>> .             .
>> .             .
>> 10T/C      00G/G90
>>
>> What I want is to get the letters which are on the left and right of '/'.
>> for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I
>> only want "C" and "T", how to get these?
>>
>> thank you,
>>
>> karena
>> --
>> View this message in context: http://r.789695.n4.nabble.com/string-handling-tp2242119p2242119.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list