[R] How to split a factor (unique identifier) into severalothers?

Greg Snow Greg.Snow at imail.org
Thu Feb 7 20:02:38 CET 2008


The essence of do.call is to call the named function (rbind in this
case) with the elements of the list as it's arguments.

In this case with a list without named elements the following:

> do.call('myfunction',mylist)

Is equivalent to

> myfuncion( mylist[[1]], mylist[[2]], mylist[[3]], ..., mylist[[n]] )

With the ... Replaced by however many additional elements are there (you
can see how it can save lots of typing).

So using rbind, it just rbinds together the elements of the list, or
uses each element (the split from the original strings) as a row of a
new object, in this case a matrix.  The as.data.frame then converts the
columns to factors.

Does this help the understanding? 

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
(801) 408-8111
 
 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tribo Laboy
> Sent: Thursday, February 07, 2008 2:33 AM
> To: Dimitris Rizopoulos
> Cc: r-help at r-project.org
> Subject: Re: [R] How to split a factor (unique identifier) 
> into severalothers?
> 
> Hi Dimitris,
> 
> 
> Your code works like charm, but I don't really understand 
> how. If you have some time I'll appreciate if you can explain 
> some more.
> 
> The contents of "vals" in your example is equivalent to the 
> contents of "splitfctr" in mine.
> 
> "as.data.frame" is quite clear, but "do.call("rbind", vals)" 
> has me puzzled.
> 
> I checked the "do.call" help, but I could not replicate the 
> results on the command line by directly using "rbind".
> 
> If I had to do it by directly using "rbind" can you show me 
> how to do it?
> 
> 
> I really appreciate your help.
> 
> 
> In the meantime I came up with another solution, which is 
> much more clunky than yours, but at least I can understand 
> how it works. I am putting it here, just as an additional 
> thing for the archives.
> 
> after the "splitfctr" ( or "vals" in Dimitris example is obtained)
> 
> I use the "unlist" function on the list and then make new 
> factors like that:
> 
> all_fctrs <- unlist(splitfctr)
> sample_fctr <- factor(all_fctrs[seq(1, length(all_fctrs), 
> 3)]) condition_fctr <- factor(all_fctrs[seq(2, 
> length(all_fctrs), 3)]) place_fctr <- factor(all_fctrs[seq(3, 
> length(all_fctrs), 3)])
> 
> then I bundle the factors into the data frame by "cbind".
> 
> 
> Thanks for the help.
> 
> TL
> 
> 
> 
> On Thu, Feb 7, 2008 at 5:20 PM, Dimitris Rizopoulos 
> <dimitris.rizopoulos at med.kuleuven.be> wrote:
> > try the following:
> >
> >  dat <- data.frame(x = c("sample1_condition1_place1",
> >     "sample2_condition1_place1", "sample3_condition1_place1",
> >     "sample1_condition2_place1", "sample1_condition2_place1"))
> >
> >  vals <- strsplit(as.character(dat$x), "_")  
> > as.data.frame(do.call("rbind", vals))
> >
> >
> >  I hope it helps.
> >
> >  Best,
> >  Dimitris
> >
> >  ----
> >  Dimitris Rizopoulos
> >  Ph.D. Student
> >  Biostatistical Centre
> >  School of Public Health
> >  Catholic University of Leuven
> >
> >  Address: Kapucijnenvoer 35, Leuven, Belgium
> >  Tel: +32/(0)16/336899
> >  Fax: +32/(0)16/337015
> >  Web: http://med.kuleuven.be/biostat/
> >      http://www.student.kuleuven.be/~m0390867/dimitris.htm
> >
> >
> >
> >
> >  ----- Original Message -----
> >  From: "Tribo Laboy" <tribolaboy at gmail.com>
> >  To: <r-help at r-project.org>
> >  Sent: Thursday, February 07, 2008 7:44 AM
> >  Subject: [R] How to split a factor (unique identifier) 
> into several  
> > others?
> >
> >
> >  > Hello,
> >  >
> >  > I have a data frame with a factor column, which uniquely 
> identifies  
> > > the observations in the data frame and it looks like this:
> >  >
> >  > sample1_condition1_place1
> >  > sample2_condition1_place1
> >  > sample3_condition1_place1
> >  > .
> >  > .
> >  > .
> >  > sample3_condition3_place3
> >  >
> >  > I want to turn it into three separate factor columns 
> "sample",  > 
> > "condition" and "place".
> >  >
> >  > This is what I did so far:
> >  >
> >  > # generate a factor column for the example  > fctr<- 
> > factor(c("sample1_condition1_place1",
> >  > "sample2_condition1_place1", "sample3_condition1_place1"))  > 
> > splitfctr <- strsplit(as.character(fctr),"_")  >  >> splitfctr  > 
> > [[1]]
> >  > [1] "sample1"    "condition1" "place1"
> >  >
> >  > [[2]]
> >  > [1] "sample2"    "condition1" "place1"
> >  >
> >  > [[3]]
> >  > [1] "sample3"    "condition1" "place1"
> >  >
> >  >
> >  > Now this is all fine, but how do I make three separate 
> factors of  
> > > this?
> >  > The object "splitfctr" is a list of character vectors, each  > 
> > character  > vector being composed of the words after spitting the 
> > long original  > world.
> >  > Now I want to form new character vectors, which contain 
> the first  
> > > component of each list entry, then another vector for the 
> second  > 
> > component, etc.
> >  > I don't want to use loops, unless that's the only way to 
> do it.I  > 
> > guess  > I have some difficulty with understanding how R indexing 
> > works...
> >  >
> >  > ______________________________________________
> >  > R-help at r-project.org mailing list
> >  > https://stat.ethz.ch/mailman/listinfo/r-help
> >  > PLEASE do read the posting guide
> >  > http://www.R-project.org/posting-guide.html
> >  > and provide commented, minimal, self-contained, 
> reproducible code.
> >  >
> >
> >
> >  Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> >
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list