[R] vectorizing problem

David Winsemius dwinsemius at comcast.net
Mon Oct 4 20:13:20 CEST 2010


On Oct 4, 2010, at 2:02 PM, Henrique Dallazuanna wrote:

> Try this:
>
> do.call(rbind.data.frame, mapply(cbind, DF$V1,  
> strsplit(as.character(DF$V2),
> ",")))
>

Also try:

 > txt
        V1                           V2
1 2315100 NR_024005,NR_024004,AK093685
2 2315106                     DQ786314
 > str(txt)
'data.frame':	2 obs. of  2 variables:
  $ V1: int  2315100 2315106
  $ V2: chr  "NR_024005,NR_024004,AK093685" "DQ786314"
# Note that my V2 was input as chr, Henrique's appears to have been a  
factor column

 > data.frame(idxs = rep(txt$V1, length(strsplit(txt$V2,  
split=",") ) ) ,
              vals = unlist(strsplit(txt$V2, split=",") )
               )
      idxs      vals
1 2315100 NR_024005
2 2315106 NR_024004
3 2315100  AK093685
4 2315106  DQ786314

-- 
David.

> On Mon, Oct 4, 2010 at 2:54 PM, Dylan Miracle  
> <dylan.miracle at gmail.com>wrote:
>
>> Hello,
>>
>> I have a two column dataframe that
>> has entries that look like this:
>>
>> 2315100       NR_024005,NR_024004,AK093685
>> 2315106       DQ786314
>>
>> and I want to change this to look like this:
>>
>> 2315100       NR_024005
>> 2315100       NR_024004
>> 2315100       AK093685
>> 2315106       DQ786314
>>
>> I can do this with the following "for" loop but the dataframe (GPL)
>> has ~140,000 rows and this takes about 15 minutes:
>>
>>
>> extGPL <- matrix(nrow=0,ncol=2)
>> for (i in 1:length(GPL[,2])){
>>      aa <- unlist(strsplit(as.character(GPL[i,2]),"\\,"))
>>      bb <- rep(as.numeric(as.character(GPL[i,1])), length(aa))
>>      cc <- matrix(c(bb,aa),ncol = 2)
>>      extGPL <- rbind(extGPL,cc)
>> }
>>
>> Is there a way to vectorize this?
>>
>> Thanks,
>>
>> Dylan Miracle
>> University of Minnesota
>> GCD Department
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list