[R] average columns of data frame corresponding to replicates

jim holtman jholtman at gmail.com
Thu Sep 9 13:31:11 CEST 2010


try this:

> myData
   sample1.id1 sample1.id2 sample2.id1 sample1.id3 sample3.id1
sample1.id4 sample2.id2
1            1           2           2           1           1
  1           1
2            1           2           2           2           1
  2           1
3            1           2           2           3           1
  3           1
4            1           2           2           4           1
  4           1
5            1           2           2           5           1
  5           1
6            1           2           2           6           1
  6           1
7            1           2           2           7           1
  7           1
8            1           2           2           8           1
  8           1
9            1           2           2           9           1
  9           1
10           1           2           2          10           1
 10           1
> newData <- NULL
> for (i in repeat_ids){
+     # determine the columns to use
+     colIndx <- grep(paste(i, "$", sep=''), colnames(myData))
+     if (length(colIndx) == 0) next  # make sure it exists
+     # create the average of the columns
+     newData <- cbind(newData, rowMeans(myData[, colIndx], na.rm=TRUE))
+     colnames(newData)[ncol(newData)] <- i  # add the name
+ }
> newData
           id1 id2
 [1,] 1.333333 1.5
 [2,] 1.333333 1.5
 [3,] 1.333333 1.5
 [4,] 1.333333 1.5
 [5,] 1.333333 1.5
 [6,] 1.333333 1.5
 [7,] 1.333333 1.5
 [8,] 1.333333 1.5
 [9,] 1.333333 1.5
[10,] 1.333333 1.5
>


On Tue, Sep 7, 2010 at 12:00 PM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
> Hi Group,
>
> I have a data frame below. Within this data frame there are  samples
> (columns) that are measured  more than once. Samples are indicated by
> "idx". So "id1" is present in columns 1, 3, and 5. Not every id is
> repeated. I would like to create a new data frame so that the repeated
>  ids are averaged. For example, in the new data frame, columns 1, 3,
> and 5 of the original will be replaced by 1 new column  that is the
> mean of these three. Thanks for any suggestions.
>
> Juliet
>
>
>
> myData <- data.frame("sample1.id1" =rep(1,10),
> "sample1.id2"=rep(2,10),
> "sample2.id1" = rep(2,10),
> "sample1.id3" = 1:10,
> "sample3.id1" = rep(1,10),
> "sample1.id4" = 1:10,
> "sample2.id2" = rep(1,10))
>
> repeat_ids <- c("id1","id2")
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list