[R] average duplicated rows?
arun
smartpink111 at yahoo.com
Fri Oct 12 20:22:14 CEST 2012
HI,
You can also try this:
dat1<-read.table(text="
gene_id sample_1 sample_2 FL_EARLY FL_LATE
763938 Eucgr.A00054 fl_S1E fl_S1L 13.170800 22.2605
763979 Eucgr.A00101 fl_S1E fl_S1L 0.367960 14.1202
1273243 Eucgr.A00101 fl_S2 fl_S1L 0.356625 14.1202
764169 Eucgr.A00350 fl_S1E fl_S1L 7.381070 43.9275
1273433 Eucgr.A00350 fl_S2 fl_S1L 10.674500 43.9275
1273669 Eucgr.A00650 fl_S2 fl_S1L 33.669100 50.0169
764480 Eucgr.A00744 fl_S1E fl_S1L 132.429000 747.2770
1273744 Eucgr.A00744 fl_S2 fl_S1L 142.659000 747.2770
764595 Eucgr.A00890 fl_S1E fl_S1L 2.937760 14.9647
764683 Eucgr.A00990 fl_S1E fl_S1L 8.681250 48.5492
1273947 Eucgr.A00990 fl_S2 fl_S1L 10.553300 48.5492
764710 Eucgr.A01020 fl_S1E fl_S1L 0.000000 57.9273
1273974 Eucgr.A01020 fl_S2 fl_S1L 0.000000 57.9273
764756 Eucgr.A01073 fl_S1E fl_S1L 8.504710 101.1870
1274020 Eucgr.A01073 fl_S2 fl_S1L 5.400010 101.1870
764773 Eucgr.A01091 fl_S1E fl_S1L 3.448910 15.7756
764826 Eucgr.A01152 fl_S1E fl_S1L 69.565700 198.2320
764831 Eucgr.A01158 fl_S1E fl_S1L 7.265640 30.9565
764845 Eucgr.A01172 fl_S1E fl_S1L 3.248020 16.9127
764927 Eucgr.A01269 fl_S1E fl_S1L 18.710200 76.6918
",sep="",header=TRUE,stringsAsFactors=FALSE)
do.call(rbind,lapply(lapply(split(dat1,dat1$gene_id),`[`,4:5),colMeans))
FL_EARLY FL_LATE
#Eucgr.A00054 13.1708000 22.2605
#Eucgr.A00101 0.3622925 14.1202
#Eucgr.A00350 9.0277850 43.9275
#Eucgr.A00650 33.6691000 50.0169
#Eucgr.A00744 137.5440000 747.2770
#Eucgr.A00890 2.9377600 14.9647
#Eucgr.A00990 9.6172750 48.5492
#Eucgr.A01020 0.0000000 57.9273
#Eucgr.A01073 6.9523600 101.1870
#Eucgr.A01091 3.4489100 15.7756
#Eucgr.A01152 69.5657000 198.2320
#Eucgr.A01158 7.2656400 30.9565
#Eucgr.A01172 3.2480200 16.9127
#Eucgr.A01269 18.7102000 76.6918
In addition to aggregate(), ddply() etc.
library(data.table)
dat2<-data.table(dat1)
dat3<-dat2[,list(FL_EARLY=mean(FL_EARLY),FL_LATE=mean(FL_LATE)),list(gene_id)]
#aggregate()
dat4<-with(dat1,aggregate(cbind(FL_EARLY,FL_LATE),list(gene_id),FUN=mean))
colnames(dat4)<-colnames(dat1)[c(1,4,5)]
#ddply()
library(plyr)
dat5<-ddply(dat1,.(gene_id),colwise(mean,c("FL_EARLY","FL_LATE")))
A.K.
----- Original Message -----
From: "Vining, Kelly" <Kelly.Vining at oregonstate.edu>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Friday, October 12, 2012 11:41 AM
Subject: [R] average duplicated rows?
Dear useRs,
I have a slightly complicated data structure and am stuck trying to extract what I need. I'm pasting an example of this data below. In some cases, there are duplicates in the "gene_id" column because there are two different "sample 1" values for a given "sample 2" value. Where these duplicates exist, I need to average the corresponding "FL_EARLY" values and retain the "FL_LATE" value and replace those two rows with a row containing the "FL_EARLY" average so that I no longer have any "gene_id" duplicates.
Seems like this is a job for some version of the apply function, but searching and puzzling over this has not gotten me anywhere. Any help will be much appreciated!
Example data:
gene_id sample_1 sample_2 FL_EARLY FL_LATE
763938 Eucgr.A00054 fl_S1E fl_S1L 13.170800 22.2605
763979 Eucgr.A00101 fl_S1E fl_S1L 0.367960 14.1202
1273243 Eucgr.A00101 fl_S2 fl_S1L 0.356625 14.1202
764169 Eucgr.A00350 fl_S1E fl_S1L 7.381070 43.9275
1273433 Eucgr.A00350 fl_S2 fl_S1L 10.674500 43.9275
1273669 Eucgr.A00650 fl_S2 fl_S1L 33.669100 50.0169
764480 Eucgr.A00744 fl_S1E fl_S1L 132.429000 747.2770
1273744 Eucgr.A00744 fl_S2 fl_S1L 142.659000 747.2770
764595 Eucgr.A00890 fl_S1E fl_S1L 2.937760 14.9647
764683 Eucgr.A00990 fl_S1E fl_S1L 8.681250 48.5492
1273947 Eucgr.A00990 fl_S2 fl_S1L 10.553300 48.5492
764710 Eucgr.A01020 fl_S1E fl_S1L 0.000000 57.9273
1273974 Eucgr.A01020 fl_S2 fl_S1L 0.000000 57.9273
764756 Eucgr.A01073 fl_S1E fl_S1L 8.504710 101.1870
1274020 Eucgr.A01073 fl_S2 fl_S1L 5.400010 101.1870
764773 Eucgr.A01091 fl_S1E fl_S1L 3.448910 15.7756
764826 Eucgr.A01152 fl_S1E fl_S1L 69.565700 198.2320
764831 Eucgr.A01158 fl_S1E fl_S1L 7.265640 30.9565
764845 Eucgr.A01172 fl_S1E fl_S1L 3.248020 16.9127
764927 Eucgr.A01269 fl_S1E fl_S1L 18.710200 76.6918
--Kelly V.
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list