[R] FW: average replicate probe values

jim holtman jholtman at gmail.com
Thu Jul 24 02:15:42 CEST 2008


Here is one way to do it:

> y <- textConnection("UNIQID UniGene Gene 1_SL 2_SL 17_SL 18_SL  38_SL
+ 1175390 Hs.10095 MLLT1 -0.00595 0.62315 0.85315 1.11215 -0.195
+ 1175392 Hs.10101 C1orf166 -0.4945 -0.04025 0.1299 -0.00575 -0.1824
+ 1187428 Hs.101014 CEP57 0.60085 0.2564 -0.42885 -0.57635 -0.14735
+ 1193447 Hs.101014 CEP57 -0.15625 -0.1681 -0.4891 -0.29995 NA
+ 1173756 Hs.1011 PROZ -0.7211 -0.68895 0.4651 0.30815 0.1133")
> x <- read.table(y, header=TRUE)
> closeAllConnections()
> # split and then aggregate so we can carry through some data
> z <- split(x, x$UniGene)
> z.l <- lapply(z, function(.data){
+     .agg <- colMeans(.data[, c(1,4:8)], na.rm=TRUE)
+     data.frame(.data[1, 2], .data[1, 3], lapply(.agg, unlist))
+ })
> do.call(rbind, z.l)
          .data.1..2. .data.1..3.  UNIQID    X1_SL    X2_SL    X17_SL
 X18_SL   X38_SL
Hs.10095     Hs.10095       MLLT1 1175390 -0.00595  0.62315  0.853150
1.11215 -0.19500
Hs.10101     Hs.10101    C1orf166 1175392 -0.49450 -0.04025  0.129900
-0.00575 -0.18240
Hs.101014   Hs.101014       CEP57 1190438  0.22230  0.04415 -0.458975
-0.43815 -0.14735
Hs.1011       Hs.1011        PROZ 1173756 -0.72110 -0.68895  0.465100
0.30815  0.11330
>
>


On Wed, Jul 23, 2008 at 5:08 PM, Kaposi-Novak, Pal
<kaposinovakp at upmc.edu> wrote:
>
> ________________________________________
> From: Kaposi-Novak, Pal
> Sent: Wednesday, July 23, 2008 5:07 PM
> To: jim holtman
> Subject: RE: [R] average replicate probe values
>
> Dear Dr Holtman,
>
> Thank you very much for your response.
>
> What I want is avarege data points in a data.frame from probes which represent the same gene (ie have the same UniGene ID).
>
> For example in the table below probe sets in rows 3 and 4 both represent the CEP57 gene.
>
> UNIQID UniGene Gene 1_SL 2_SL 17_SL 18_SL  38_ SL
> 1175390 Hs.10095 MLLT1 -0.00595 0.62315 0.85315 1.11215 -0.195
> 1175392 Hs.10101 C1orf166 -0.4945 -0.04025 0.1299 -0.00575 -0.1824
> 1187428 Hs.101014 CEP57 0.60085 0.2564 -0.42885 -0.57635 -0.14735
> 1193447 Hs.101014 CEP57 -0.15625 -0.1681 -0.4891 -0.29995 NA
> 1173756 Hs.1011 PROZ -0.7211 -0.68895 0.4651 0.30815 0.1133
>
> I would like to make R find the matching UniGene IDs and average expression values for each sample.
> The result would look like the table below:
>
> UNIQID UniGene Gene 1_SL 2_SL 17_SL 18_SL  38_ SL
> 1175390 Hs.10095 MLLT1 -0.00595 0.62315 0.85315 1.11215 -0.195
> 1175392 Hs.10101 C1orf166 -0.4945 -0.04025 0.1299 -0.00575 -0.1824
> 1199466 Hs.101014 CEP57 0.2223 0.04415 -0.458975 -0.43815 -0.14735
> 1173756 Hs.1011 PROZ -0.7211 -0.68895 0.4651 0.30815 0.1133
>
> I am sorry for the naivness of my question, but I am not a trained biostatistician just need to analyze data.
>
> Sincerely,
>
> Pal Kaposi-Novak MD PhD
> PIRT Fellow
> University of Pittsburgh
> Department of Pathology
> BST S408, 200 Lothrop Str
> Pittsburgh, PA , 15261
> Tel: (412) 383-7748
> kaposinovakp at umpc.edu
> ________________________________________
> From: jim holtman [jholtman at gmail.com]
> Sent: Wednesday, July 23, 2008 7:15 AM
> To: Kaposi-Novak, Pal
> Cc: r-help at r-project.org
> Subject: Re: [R] average replicate probe values
>
> It would be helpful if you included a sample of the data so that we
> could understand what you would like to do with it (before/after
> pictures).
>
> ?aggregate
>
> On Tue, Jul 22, 2008 at 9:57 PM, Kaposi-Novak, Pal
> <kaposinovakp at upmc.edu> wrote:
>> Hi,
>>
>> Could somebody tell me how I can average expression values of replicate probe sets in an data frame?
>>
>> Thanks
>>
>> Pal Kaposi-Novak MD PhD
>> PIRT Fellow
>> University of Pittsburgh
>> Department of Pathology
>> BST S408, 200 Lothrop Str
>> Pittsburgh, PA , 15261
>> Tel: (412) 383-7748
>> kaposinovakp at umpc.edu
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list