[R] transforming a badly organized data base into a list of data frames

Gabor Grothendieck ggrothendieck at gmail.com
Fri Sep 4 15:29:15 CEST 2009


Try this:

x2 <- subset(x, A != "")
split(x2, cumsum(x2$B == ""))



On Fri, Sep 4, 2009 at 8:57 AM, Dimitri Liakhovitski<ld7631 at gmail.com> wrote:
> Dear R-ers!
>
> I have a badly organized data base in Excel. Once I read it into R it
> looks like this (all variables become factors because of many spaces
> and other characters in Excel):
>
> x<-data.frame(A=c("","Name1","text1","text2","text3","","","","Name2","text1","text2","text3","","","Name3","text1","text2","text3","","Name1","text1","text2","text3",""),
> B=c("","",1,2,3,"","","","",4,5,6,"","","",7,8,9,"","",3,3,3,""),C=c("","",1,1,2,"","","","",5,6,5,"","","",9,8,7,"","",2,2,2,""),
> D=c("","",3,2,1,"","","","",6,5,4,"","","",7,7,8,"","",1,1,1,""))
> x
>
> I was wondering if it's at all possible to grab the data blocks from
> this file and put them into a list, e.g., so that the first element of
> the list is the data frame:
>
> Name1 1 1 3
> Name12 1 2
> Name13 2 1
>
> second element of the list is the data frame
>
> Name2 4 5 6
> Name2 5 6 5
> Name2 6 5 4
>
> etc.
> Notice that all blocks have the same number of rows.
>
> I have several challenges:
> 1. I cannot use the row number because the number of irrelevant lines
> between blocks is always different.
> 2. I'd love to replace all the "text1", "text2", "text3" strings
> (which are always the same within each block) with the name that is
> one cell above it.
> 3. All variables are now factors - I am afraid if I recode them later
> as "as numeric" - they numbers will change their values.
>
> Thank you very much for your suggestions!
>
> --
> Dimitri Liakhovitski
> Ninah.com
> Dimitri.Liakhovitski at ninah.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list