[R] "unsparse" a vector

Bert Gunter gunter.berton at gene.com
Wed Feb 8 22:02:00 CET 2012


Sam:

On Wed, Feb 8, 2012 at 12:56 PM, Sam Steingold <sds at gnu.org> wrote:
> To be clear, I can do that with nested for loops:
>
> v <- c("A1B2","A3C4","B5","C6A7B8")
> l <- strsplit(gsub("(.{2})","\\1,",v),",")
> d <- data.frame(A=vector(length=4,mode="integer"),
>                B=vector(length=4,mode="integer"),
>                C=vector(length=4,mode="integer"))
>
> for (i in 1:length(l)) {
>  l1 <- l[[i]]
>  for (j in 1:length(l1)) {
>    d[[substring(l1[j],1,1)]][i] <- as.numeric(substring(l1[j],2,2))
>  }
> }
>
>
> but I am afraid that handling 1,000,000 (=length(unlist(l))) strings in
> a loop will kill me.

Well, that depends on how "dead" you can stand being. Try it with a
1000 entry subvector and see how bad it gets. A few extra minutes of
computing time to save many more minutes of programming time seems a
reasonable tradeoff. Alternatively, see ?compile to compile your
solution into bytecode, which might give a few fold reduction in time
(or not). The calculation could also be parallelized using the
parallel package, I'm sure.

-- Bert
>
>
>> * Sam Steingold <fqf at tah.bet> [2012-02-08 15:34:38 -0500]:
>>
>> Suppose I have a vector of strings:
>> c("A1B2","A3C4","B5","C6A7B8")
>> [1] "A1B2"   "A3C4"   "B5"     "C6A7B8"
>> where each string is a sequence of <column><value> pairs
>> (fixed width, in this example both value and name are 1 character, in
>> reality the column name is 6 chars and value is 2 digits).
>> I need to convert it to a data frame:
>> data.frame(A=c(1,3,0,7),B=c(2,0,5,8),C=c(0,4,0,6))
>>   A B C
>> 1 1 2 0
>> 2 3 0 4
>> 3 0 5 0
>> 4 7 8 6
>>
>> how do I do that?
>> thanks.
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
> http://palestinefacts.org http://iris.org.il http://camera.org
> http://ffii.org http://www.PetitionOnline.com/tap12009/
> An elephant is a mouse with an operating system.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list