[R] identify duplicate entries in data frame and calculate mean
Matthew
mccormack at molbio.mgh.harvard.edu
Tue May 24 21:46:44 CEST 2016
I have a data frame with 10 columns.
In the last column is an alphaneumaric identifier.
For most rows, this alphaneumaric identifier is unique to the file,
however some of these alphanemeric idenitifiers occur in duplicate,
triplicate or more. When they do occur more than once they are in
consecutive rows, so when there is a duplicate or triplicate or
quadruplicate (let's call them multiplicates), they are in consecutive rows.
In column 7 there is an integer number (may or may not be unique. does
not matter).
I want to identify each multiple entries (multiplicates) occurring in
column 10 and then for each multiplicate calculate the mean of the
integers column 7.
As an example, I will show just two columns:
Length Identifier
321 A234
350 A234
340 A234
180 B123
198 B225
What I want to do (in the above example) is collapse all the A234's and
report the mean to get this:
Length Identifier
337 A234
180 B123
198 B225
Matthew
More information about the R-help
mailing list