[R] meaning of formula in aggregate function
ehlers at ucalgary.ca
Sat Jan 22 16:36:53 CET 2011
> Dear R community
> Recently, dear Henrique Dallazuanna literally saved me solving one
> problem on data transformation which follows:
> (n_, _n, j_, k_ signify numbers)
> SOURCE DATA:
> id cycle1 cycle2 cycle3 … cycle_n
> 1 c c c c
> 1 m m m m
> 1 f f f f
> 2 m m m NA
> 2 f f f NA
> 2 c c c NA
> 3 a a NA NA
> 3 c c c NA
> 3 f f f NA
> 3 NA NA m NA
> Q: How to transform source data to:
> RESULT DATA:
> id cyc1 cyc2 cyc3 … cyc_n
> 1 cfm cfm cfm cfm
> 2 cfm cfm cfm
> 3 acf acf cfm
> The Henrique's solution is:
> aggregate(.~ id, lapply(df, as.character), FUN =
> function(x)paste(sort(x), collapse = ''), na.action = na.pass)
> Could somebody EXPLAIN HOW IT WORKS?
> I mean Henrique saved my investigation indeed.
> However, considering the fact, that I am about to perform investigation
> of cancer chemotherapy in 500 patients, it would be nice to know what
> I am actually doing.
> 1. All help says about LHS in formulas like '.~id' is that it's
> name is "dot notation". And not a single word more. Thus, I have no
> clue, what dot in that formula really means.
Well, ?aggregate does (rather gently) point you to the
help page for _formula_ where you will find quite a few
word about the use of '.' in the Details section.
> 2. help says:
> Note that ‘paste()’ coerces ‘NA_character_’, the character missing
> value, to ‘"NA"'
> And at the same time:
> ‘na.pass’ returns the object unchanged.
> I am happy, that I don't have NAs in mydata. I just don't understand
> how it happened.
I don't understand what you're asking.
> 3. Can't see the real difference between 'FUN = function(x) paste(x)'
> and 'FUN = paste'. However, former works perfectly while latter simply
> do not.
That's not quite true. You're using paste(sort(x)) and not
just x in Henrique's solution. And that's precisely
the point: when a function is not 'simple', you need to
define it. Henrique is defining it 'on the fly'; you
could also define it separately before the aggregate()
call and then use it like this:
myfun <- function(x) paste(sort(x), collapse='')
aggregate(...., FUN = myfun, ....)
> All I can follow from code above is that R breaks data on groups with
> same id, then it tear each little 'cycle' piece in separate characters,
> then sorts them and put together these characters within same id on each
> 'cycle'. I miss how R put together all this mess back into nice data
> frame of long format. NAs is also a question, as I said before.
> Could you please put some light on it if you don't mind to answer those
> naive questions.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help