[R] Summarize data for MCA (FactoMineR)

David Winsemius dwinsemius at comcast.net
Sun Apr 27 17:10:19 CEST 2008


"Nelson Castillo" <nelsoneci at gmail.com> wrote in
news:2accc2ff0804251655o32686b99j73cf7df37243d08f at mail.gmail.com: 

> Hi :-)
> 
> I'm new to R and I started using it for a project (I'm the CS guy in
> a group of statisticians helping them find out how to solve issues
> as they come out). This is my first post to the list and I am
> starting to learn R. 
> 
> Well, they were used to doing MCA analysis in other programs where
> the data seems to be preprocessed automatically before running MCA.
> 
> So, they need to process a data set that comes with N=1000000 of
> elements, but there are really about N/100 distinct elements over
> all the variables, so the MCA can be run in reasonable time
> summarizing data. 
> 
> So, the question is:
> 
> How can I turn x from:
> 
> x <-
> structure(list(weight = c(1, 1, 2, 1, 2), var1 = structure(c(1L,
> 1L, 1L, 1L, 2L), .Label = c("A", "C"), class = "factor"), var2 =
> structure(c(1L,
> 1L, 1L, 1L, 2L), .Label = c("B", "D"), class = "factor")), .Names =
> c("weight", "var1", "var2"), row.names = c(NA, 5L), class =
> "data.frame") 
> 
> to:
> 
> y <-
> structure(list(weihgt = c(5L, 2L), var1 = structure(1:2, .Label =
> c("A", "C"), class = "factor"), var2 = structure(1:2, .Label =
> c("B", "D"), class = "factor")), .Names = c("weihgt", "var1", "var2"
> ), class = "data.frame", row.names = c(NA, -2L))
> 
> using R?
> 
> That is, from:
> 
>> x
>   weight var1 var2
> 1      1    A    B
> 2      1    A    B
> 3      2    A    B
> 4      1    A    B
> 5      2    C    D
> 
> to:
> 
>> y
>   weihgt var1 var2
> 1      5    A    B
> 2      2    C    D
> 

Does this suffice?

s.wt <- with(x, 
          aggregate(weight, by=list(var1=var1,var2=var2), sum)
             )
#> s.wt
#  var1 var2 x
#1    A    B 5
#2    C    D 2

#then fix names
names(s.wt)[3] <- "weight"

#> s.wt
#  var1 var2 weight
#1    A    B      5
#2    C    D      2

I believe that the reshape or reShape packages could do this in one 
step.


-- 
David Winsemius


> 
> The idea is that there is one occurrence of "A B" repeated 4 times
> in the original table,
> and it is summarized in the second table, computing the sum of the
> weights. 
> 
> I solved the problem using Perl, but I'd like to know what I have to
> read in order to
> do it in R.
> 
> Regards,
> Nelson.-
>



More information about the R-help mailing list