[R] Summarize data for MCA (FactoMineR)
David Winsemius
dwinsemius at comcast.net
Sun Apr 27 17:10:19 CEST 2008
"Nelson Castillo" <nelsoneci at gmail.com> wrote in
news:2accc2ff0804251655o32686b99j73cf7df37243d08f at mail.gmail.com:
> Hi :-)
>
> I'm new to R and I started using it for a project (I'm the CS guy in
> a group of statisticians helping them find out how to solve issues
> as they come out). This is my first post to the list and I am
> starting to learn R.
>
> Well, they were used to doing MCA analysis in other programs where
> the data seems to be preprocessed automatically before running MCA.
>
> So, they need to process a data set that comes with N=1000000 of
> elements, but there are really about N/100 distinct elements over
> all the variables, so the MCA can be run in reasonable time
> summarizing data.
>
> So, the question is:
>
> How can I turn x from:
>
> x <-
> structure(list(weight = c(1, 1, 2, 1, 2), var1 = structure(c(1L,
> 1L, 1L, 1L, 2L), .Label = c("A", "C"), class = "factor"), var2 =
> structure(c(1L,
> 1L, 1L, 1L, 2L), .Label = c("B", "D"), class = "factor")), .Names =
> c("weight", "var1", "var2"), row.names = c(NA, 5L), class =
> "data.frame")
>
> to:
>
> y <-
> structure(list(weihgt = c(5L, 2L), var1 = structure(1:2, .Label =
> c("A", "C"), class = "factor"), var2 = structure(1:2, .Label =
> c("B", "D"), class = "factor")), .Names = c("weihgt", "var1", "var2"
> ), class = "data.frame", row.names = c(NA, -2L))
>
> using R?
>
> That is, from:
>
>> x
> weight var1 var2
> 1 1 A B
> 2 1 A B
> 3 2 A B
> 4 1 A B
> 5 2 C D
>
> to:
>
>> y
> weihgt var1 var2
> 1 5 A B
> 2 2 C D
>
Does this suffice?
s.wt <- with(x,
aggregate(weight, by=list(var1=var1,var2=var2), sum)
)
#> s.wt
# var1 var2 x
#1 A B 5
#2 C D 2
#then fix names
names(s.wt)[3] <- "weight"
#> s.wt
# var1 var2 weight
#1 A B 5
#2 C D 2
I believe that the reshape or reShape packages could do this in one
step.
--
David Winsemius
>
> The idea is that there is one occurrence of "A B" repeated 4 times
> in the original table,
> and it is summarized in the second table, computing the sum of the
> weights.
>
> I solved the problem using Perl, but I'd like to know what I have to
> read in order to
> do it in R.
>
> Regards,
> Nelson.-
>
More information about the R-help
mailing list