[R] "unsparse" a vector

Bert Gunter gunter.berton at gene.com
Wed Feb 8 21:58:21 CET 2012


I suspect there are cleverer ways to do it, especially using packages
like stringr and gsubfn, but using base tools, you can hack it without
too much effort:

?gregexpr

is the key. To get started (x is your example vector of character strings):

> gregexpr("[[:alpha:]]+[[:digit:]]+",x)
[[1]]
[1] 1 3
attr(,"match.length")
[1] 2 2
attr(,"useBytes")
[1] TRUE

[[2]]
[1] 1 3
attr(,"match.length")
[1] 2 2
attr(,"useBytes")
[1] TRUE

[[3]]
[1] 1
attr(,"match.length")
[1] 2
attr(,"useBytes")
[1] TRUE

[[4]]
[1] 1 3 5
attr(,"match.length")
[1] 2 2 2
attr(,"useBytes")
[1] TRUE

The components of the result give you indices of the start and stop
values for each "entry" in your final matrix/data frame. You can thus
lapply() on this list to get the column name-value pairs substrings
and decode them.

Alternatively, if all your names are really 6 characters and all your
values are really two digits,
?nchar and ?substring will get you the name-value substrings directly.

I leave the niggling details to you (or to other helpeRs -- especially
those who can suggest a more elegant approach).

-- Bert





On Wed, Feb 8, 2012 at 12:34 PM, Sam Steingold <sds at gnu.org> wrote:
> Suppose I have a vector of strings:
> c("A1B2","A3C4","B5","C6A7B8")
> [1] "A1B2"   "A3C4"   "B5"     "C6A7B8"
> where each string is a sequence of <column><value> pairs
> (fixed width, in this example both value and name are 1 character, in
> reality the column name is 6 chars and value is 2 digits).
> I need to convert it to a data frame:
> data.frame(A=c(1,3,0,7),B=c(2,0,5,8),C=c(0,4,0,6))
>  A B C
> 1 1 2 0
> 2 3 0 4
> 3 0 5 0
> 4 7 8 6
>
> how do I do that?
> thanks.
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
> http://mideasttruth.com http://jihadwatch.org http://pmw.org.il
> http://openvotingconsortium.org http://iris.org.il http://memri.org
> What's the difference between Apathy & Ignorance? -I don't know and don't care!
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list