[R] Decomposing a List
William Dunlap
wdunlap at tibco.com
Fri Apr 26 19:09:13 CEST 2013
You might add vapply() to you repertoire, as it is quicker than sapply but
also does some error checking on the your input data. E.g., your f2 returns
a matrix whose columns are the elements of the list l and you assume that
there each element of l contains 2 character strings.
f2 <- function(l)matrix(unlist(l),nr=2)
Here is a function based on vapply() the returns the same thing but also
verifies that element of l is really a 2-long character vector.
f2v <- function (l) vapply(l, function(x) x, FUN.VALUE = character(2))
and a function to generate datasets of various sizes
makeL <- function(n)strsplit(paste(sample(LETTERS,n,rep=TRUE),sample(1:10,n,rep=TRUE),sep="+"),"+",fix=TRUE)
Timing the functions on a million-long list I get
> l <- makeL(n=10^6)
> system.time( r2 <- f2(l) )
user system elapsed
0.088 0.000 0.090
> system.time( r2v <- f2v(l) )
user system elapsed
0.92 0.00 0.92
> identical(r2, r2v)
[1] TRUE
vapply() is ten times slower than unlist() but three times faster than sapply(x,function(x)x). However,
when you give it data that doesn't meet your expectations, which is common when using strsplit(),
f2v tells you about the problem and f2 gives you an incorrect result:
> l[[10]] <- c("a","b","c","d")
> system.time( r2v <- f2v(l) )
Error in vapply(l, function(x) x, FUN.VALUE = character(2)) :
values must be length 2,
but FUN(X[[10]]) result is length 4
Timing stopped at: 0.004 0 0.002
> system.time( rv <- f2(l) )
user system elapsed
0.088 0.008 0.095
> dim(rv) # you will have alignment problems later
[1] 2 1000001
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Bert Gunter
> Sent: Thursday, April 25, 2013 7:54 AM
> To: Ted.Harding at wlandres.net
> Cc: R mailing list
> Subject: Re: [R] Decomposing a List
>
> Well, what you really want to do is convert the list to a matrix, and
> it can be done directly and considerably faster than with the
> (implicit) looping of sapply:
>
> f1 <- function(l)sapply(l,"[",1)
> f2 <- function(l)matrix(unlist(l),nr=2)
> l <-
> strsplit(paste(sample(LETTERS,1e6,rep=TRUE),sample(1:10,1e6,rep=TRUE),sep="+"),"+",f
> ix=TRUE)
>
> ## Then you get these results:
>
> > system.time(x1 <- f1(l))
> user system elapsed
> 1.92 0.01 1.95
> > system.time(x2 <- f2(l))
> user system elapsed
> 0.06 0.02 0.08
> > system.time(x2 <- f2(l)[1,])
> user system elapsed
> 0.1 0.0 0.1
> > identical(x1,x2)
> [1] TRUE
>
>
> Cheers,
> Bert
>
>
>
>
>
>
> On Thu, Apr 25, 2013 at 3:32 AM, Ted Harding <Ted.Harding at wlandres.net> wrote:
> > Thanks, Jorge, that seems to work beautifully!
> > (Now to try to understand why ... but that's for later).
> > Ted.
> >
> > On 25-Apr-2013 10:21:29 Jorge I Velez wrote:
> >> Dear Dr. Harding,
> >>
> >> Try
> >>
> >> sapply(L, "[", 1)
> >> sapply(L, "[", 2)
> >>
> >> HTH,
> >> Jorge.-
> >>
> >>
> >>
> >> On Thu, Apr 25, 2013 at 8:16 PM, Ted Harding <Ted.Harding at wlandres.net>wrote:
> >>
> >>> Greetings!
> >>> For some reason I am not managing to work out how to do this
> >>> (in principle) simple task!
> >>>
> >>> As a result of applying strsplit() to a vector of character strings,
> >>> I have a long list L (N elements), where each element is a vector
> >>> of two character strings, like:
> >>>
> >>> L[1] = c("A1","B1")
> >>> L[2] = c("A2","B2")
> >>> L[3] = c("A3","B3")
> >>> [etc.]
> >>>
> >>> >From L, I wish to obtain (as directly as possible, e.g. avoiding
> >>> a loop) two vectors each of length N where one contains the strings
> >>> that are first in the pair, and the other contains the strings
> >>> which are second, i.e. from L (as above) I would want to extract:
> >>>
> >>> V1 = c("A1","A2","A3",...)
> >>> V2 = c("B1","B2","B3",...)
> >>>
> >>> Suggestions?
> >>>
> >>> With thanks,
> >>> Ted.
> >>>
> >>> -------------------------------------------------
> >>> E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
> >>> Date: 25-Apr-2013 Time: 11:16:46
> >>> This message was sent by XFMail
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >
> > -------------------------------------------------
> > E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
> > Date: 25-Apr-2013 Time: 11:31:57
> > This message was sent by XFMail
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list