[R] how to deduplicate records, e.g. using melt() and cast()
Dimitris Rizopoulos
d.rizopoulos at erasmusmc.nl
Mon May 7 12:09:33 CEST 2012
you could try aggregate(), e.g.,
my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
rep("pw.C", 1)),
cond.one = c(0.5, NA, 0.4, NA, NA, NA),
cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
cond.three = c(NA, NA, NA, NA, 0.1, NA))
aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)
or
sum. <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
aggregate(my.df[-1], my.df['pathway'], sum.)
I hope it helps.
Best,
Dimitris
On 5/7/2012 11:50 AM, Karl Brand wrote:
> Esteemed UseRs,
>
> This must be embarrassingly trivial to achieve with e.g., melt() and
> cast(): deduplicating records ("pw.X" in example) for a given set of
> responses ("cond.Y" in example).
>
> Hopefully the runnable example shows clearly what i have and what i'm
> trying to convert it to. But i'm just not getting it, ?cast that is! So
> i'd really appreciate some ones patience to clarify this, using the
> reshape package, or any other approach.
>
> With sincere thanks in advance,
>
> Karl
>
>
> ## Runnable example
> ## The data.frame i have:
> library("reshape")
> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
> rep("pw.C", 1)),
> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
> cond.three = c(NA, NA, NA, NA, 0.1, NA))
> my.df
> ## The data fram i want:
> wanted.df <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
> cond.one = c(0.5, 0.4, NA),
> cond.two = c(0.6, 0.9, 0.2),
> cond.three = c(NA, 0.1, NA))
> wanted.df
>
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/
More information about the R-help
mailing list