[R] splitting character strings and converting to numeric vectors

Gabor Grothendieck ggrothendieck at gmail.com
Thu May 6 14:58:44 CEST 2010


Try this:

> x <- c("-448623_854854", "-448563_854850", "-448442_854842", "-448301_854833",
+ "-448060_854818", "-446828_854736")
>
>
> read.table(textConnection(x), sep = "_")
       V1     V2
1 -448623 854854
2 -448563 854850
3 -448442 854842
4 -448301 854833
5 -448060 854818
6 -446828 854736

Here is another way.

> library(gsubfn) # see http://gsubfn.googlecode.com
>
> data.frame(strapply(x, "[-0-9]+", as.numeric, simplify = rbind))
       X1     X2
1 -448623 854854
2 -448563 854850
3 -448442 854842
4 -448301 854833
5 -448060 854818
6 -446828 854736

If you omit data.frame then it will return a matrix.


On Thu, May 6, 2010 at 8:12 AM, Jim Bouldin <jrbouldin at ucdavis.edu> wrote:
>
> This seemingly should be quite simple but I can't solve it:
>
> I have a long character vector of geographic data (data frame column named
> "XY") whose elements vary in length (from 11 to 14 chars).  Each element is
> structured as a set of digits, then an underscore, then more digits, e.g:
>
>> data.frame(head(as.character(XY)))
>  head.as.character.XY..
> 1         -448623_854854
> 2         -448563_854850
> 3         -448442_854842
> 4         -448301_854833
> 5         -448060_854818
> 6         -446828_854736
>
> I simply need to separate the two sets of digits from each other and assign
> them into new columns.  The closest I've been able to get is by:
>
>> test=t(as.matrix(data.frame(head(strsplit(as.character(XY), "\\_")))))
>> test
>                       [,1]      [,2]
> c...448623....854854.. "-448623" "854854"
> c...448563....854850.. "-448563" "854850"
> c...448442....854842.. "-448442" "854842"
> c...448301....854833.. "-448301" "854833"
> c...448060....854818.. "-448060" "854818"
> c...446828....854736.. "-446828" "854736"
>
> So far so good, but  columns 1:2 will not coerce to either numeric or
> integer, for unknown reasons.  Thanks for any help (and/or suggestions on a
> better way to code this).
>
>
>
> Jim Bouldin, PhD
> Research Ecologist
> Department of Plant Sciences, UC Davis
> Davis CA, 95616
> 530-554-1740
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list