[R] remove rows based on row mean

Adrian Johnson oriolebaltimore at gmail.com
Thu Aug 18 23:33:23 CEST 2016


Hi Group,
I have a data matrix sm (dput code given below).

I want to create a data matrix with rows with same variable that have
higher mean.

> sm
     Gene GSM529305 GSM529306 GSM529307 GSM529308
1    A1BG      6.57      6.72      6.83      6.69
2    A1CF      2.91      2.80      3.08      3.00
3   A2LD1      5.82      7.01      6.62      6.87
4     A2M      9.21      9.35      9.32      9.19
5     A2M      2.94      2.50      3.16      2.76
6  A4GALT      6.86      5.75      6.06      7.04
7   A4GNT      3.97      3.56      4.22      3.88
8    AAA1      3.39      2.90      3.16      3.23
9    AAAS      8.26      8.63      8.40      8.70
10   AAAS      6.82      7.15      7.33      6.51

For example in rows 4 and 5 have same variable Gene A2M. I want to
select only row that has higher mean. I wrote the following code that
gives me duplicate rows with higher mean but I cannot properly write
the result. Could someone help.  Thanks

ugns <- unique(sm$Gene)

exwidh = c()

for(i in 1:length(ugns)){
k = ugns[i]
exwidh[i] <- sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decreasing=TRUE)[1]),]
}





structure(list(Gene = c("A1BG", "A1CF", "A2LD1", "A2M", "A2M",
"A4GALT", "A4GNT", "AAA1", "AAAS", "AAAS"), GSM529305 = c(6.57,
2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72,
2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83,
3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69,
3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names = c("Gene",
"GSM529305", "GSM529306", "GSM529307", "GSM529308"), row.names = c(NA,
10L), class = "data.frame")



More information about the R-help mailing list