[R] how to deduplicate records, e.g. using melt() and cast()

Karl Brand k.brand at erasmusmc.nl
Mon May 7 12:30:31 CEST 2012


Dimitris, Petra,

Thank you! aggregate() is my lesson for today, not melt() | cast()

Really appreciate the super fast help,

Karl

On 07/05/12 12:09, Dimitris Rizopoulos wrote:
> you could try aggregate(), e.g.,
>
> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
> rep("pw.C", 1)),
> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>
>
> aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)
>
> or
>
> sum. <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
> aggregate(my.df[-1], my.df['pathway'], sum.)
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
>
> On 5/7/2012 11:50 AM, Karl Brand wrote:
>> Esteemed UseRs,
>>
>> This must be embarrassingly trivial to achieve with e.g., melt() and
>> cast(): deduplicating records ("pw.X" in example) for a given set of
>> responses ("cond.Y" in example).
>>
>> Hopefully the runnable example shows clearly what i have and what i'm
>> trying to convert it to. But i'm just not getting it, ?cast that is! So
>> i'd really appreciate some ones patience to clarify this, using the
>> reshape package, or any other approach.
>>
>> With sincere thanks in advance,
>>
>> Karl
>>
>>
>> ## Runnable example
>> ## The data.frame i have:
>> library("reshape")
>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>> rep("pw.C", 1)),
>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>> my.df
>> ## The data fram i want:
>> wanted.df <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
>> cond.one = c(0.5, 0.4, NA),
>> cond.two = c(0.6, 0.9, 0.2),
>> cond.three = c(NA, 0.1, NA))
>> wanted.df
>>
>>
>

-- 
Karl Brand
Dept of Cardiology and Dept of Bioinformatics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161



More information about the R-help mailing list