[R] identify duplicate entries in data frame and calculate mean

Matthew mccormack at molbio.mgh.harvard.edu
Tue May 24 22:37:32 CEST 2016


Thank you very much, Dan.

These work great. Two more great answers to my question.

Matthew

On 5/24/2016 4:15 PM, Nordlund, Dan (DSHS/RDA) wrote:
> You have several  options.
>
> 1.  You could use the aggregate function.  If your data frame is called DF, you could do something like
>
> with(DF, aggregate(Length, list(Identifier), mean))
>
> 2.  You could use the dplyr package like this
>
> library(dplyr)
> summarize(group_by(DF, Identifier), mean(Length))
>
>
> Hope this is helpful,
>
> Dan
>
> Daniel Nordlund, PhD
> Research and Data Analysis Division
> Services & Enterprise Support Administration
> Washington State Department of Social and Health Services
>
>
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Matthew
>> Sent: Tuesday, May 24, 2016 12:47 PM
>> To: r-help at r-project.org
>> Subject: [R] identify duplicate entries in data frame and calculate mean
>>
>> I have a data frame with 10 columns.
>> In the last column is an alphaneumaric identifier.
>> For most rows, this alphaneumaric identifier is unique to the file, however
>> some of these alphanemeric idenitifiers occur in duplicate, triplicate or more.
>> When they do occur more than once they are in consecutive rows, so when
>> there is a duplicate or triplicate or quadruplicate (let's call them multiplicates),
>> they are in consecutive rows.
>>
>> In column 7 there is an integer number (may or may not be unique. does not
>> matter).
>>
>> I want to identify each multiple entries (multiplicates) occurring in column 10
>> and then for each multiplicate calculate the mean of the integers column 7.
>>
>> As an example, I will show just two columns:
>> Length  Identifier
>> 321     A234
>> 350     A234
>> 340     A234
>> 180     B123
>> 198     B225
>>
>> What I want to do (in the above example) is collapse all the A234's and report
>> the mean to get this:
>> Length  Identifier
>> 337     A234
>> 180     B123
>> 198     B225
>>
>>
>> Matthew
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list