[R] Weird behavior of aggregate() function

Ista Zahn istazahn at gmail.com
Mon Jan 26 17:51:08 CET 2015


?aggregate informs you that unless x is a time series it will be
converted to a data.frame. data.frame will convert your character to a
factor unless you tell it not to.

You can prevent this by converting vari to a data.frame yourself,
passing the stringsAsFactors argument, like this:

aggregate(data.frame(TE = vari, stringsAsFactors = FALSE),
by=list(gr),faire.paires)

Best,
Ista

On Mon, Jan 26, 2015 at 11:30 AM,
<Bastien.Ferland-Raymond at mffp.gouv.qc.ca> wrote:
>
> Hello list,
>
> I have found a weird behavior of the aggregate() function when used with characters. I think the problem as to do with converting characters to factors.
>
> I'm trying to aggregate a character vector using an homemade function.  My function is giving me all the possible pairs of modalities observed.
>
>
> Reproducible code:
>
> #######
> ### my grouping variable
> gr <- c("A","A","B","B","C","C","C","D","D","E","E","E")
> ### my variable
> vari <- c("rs2","rs2","mj2","mj1","rs1","rs1","rs2","mj1","mj1","rs1","mj1","mj2")
>
> ### what the table would look like
> cbind(gr,vari)
>
> ###  My function that gives every pairs of variables possible (my real function can go up to length(TE)==5, but for the sake of the example, I've reduced it here)
> faire.paires <- function(TE){
> gg <- rbind(c(TE[1],TE[2]),
>             c(TE[1],TE[3]))
> gg <- gg[rowSums(is.na(gg))==0,,drop=F]
> gg
> }
>
> ###  The function gives exactly what I want when I run it on a specific entry
> faire.paires(TE = vari[gr=="B"])
>
> ###  But with aggregate(), it transforms everything into integer
> res <- aggregate(list(TE = vari), by=list(gr),faire.paires)
> res
> str(res)
>
> ###  it's like it's using factor than losing the key to tell me which integer
> ###  mean which modality
>
>
> ###  if I give it directly factors:
> res2 <- aggregate(list(TE = as.factor(vari)), by=list(gr),faire.paires)
> res2
> str(res2)
>
> ###  does not fix the problem.
> ############
>
> Any idea?
>
> I know my function may not be the best or most efficient way to succeed. However, I'm still puzzled on
> why aggregate gives me this weird output.
>
> Best regards,
>
> Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol.
> Division des orientations et projets spéciaux
> Direction des inventaires forestiers
> Ministère des Forêts, de la Faune et des Parcs
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list