[R] meaning of formula in aggregate function
d.kazakiewicz at gmail.com
Sat Jan 22 13:44:59 CET 2011
Dear R community
Recently, dear Henrique Dallazuanna literally saved me solving one
problem on data transformation which follows:
(n_, _n, j_, k_ signify numbers)
id cycle1 cycle2 cycle3 … cycle_n
1 c c c c
1 m m m m
1 f f f f
2 m m m NA
2 f f f NA
2 c c c NA
3 a a NA NA
3 c c c NA
3 f f f NA
3 NA NA m NA
Q: How to transform source data to:
id cyc1 cyc2 cyc3 … cyc_n
1 cfm cfm cfm cfm
2 cfm cfm cfm
3 acf acf cfm
The Henrique's solution is:
aggregate(.~ id, lapply(df, as.character), FUN =
function(x)paste(sort(x), collapse = ''), na.action = na.pass)
Could somebody EXPLAIN HOW IT WORKS?
I mean Henrique saved my investigation indeed.
However, considering the fact, that I am about to perform investigation
of cancer chemotherapy in 500 patients, it would be nice to know what
I am actually doing.
1. All help says about LHS in formulas like '.~id' is that it's
name is "dot notation". And not a single word more. Thus, I have no
clue, what dot in that formula really means.
2. help says:
Note that ‘paste()’ coerces ‘NA_character_’, the character missing
value, to ‘"NA"'
And at the same time:
‘na.pass’ returns the object unchanged.
I am happy, that I don't have NAs in mydata. I just don't understand
how it happened.
3. Can't see the real difference between 'FUN = function(x) paste(x)'
and 'FUN = paste'. However, former works perfectly while latter simply
All I can follow from code above is that R breaks data on groups with
same id, then it tear each little 'cycle' piece in separate characters,
then sorts them and put together these characters within same id on each
'cycle'. I miss how R put together all this mess back into nice data
frame of long format. NAs is also a question, as I said before.
Could you please put some light on it if you don't mind to answer those
More information about the R-help