Fwd: Re: [BioC] Different levels of replicates and how to create a correct targets file out of that.

Johan Lindberg johanl at kiev.biotech.kth.se
Fri Apr 2 11:09:33 CEST 2004


Hi All.
Thank you for a very helpful discussion. I have a followup question on a 
remark from Gordon.

"I think your approach is actually a good one *but* you need to give double 
weight to cases where you have averaged over two technical replicates. Use 
the 'weights' component of your MAList object to do this."

I have never used weights in LIMMA and the helpfile of gls.series doesnt 
tell in what range the weights should be. I tried to search the mail 
archives after info regarding weights but I found only information on 
spotweights when creating the RG object etc.

So this is how I  think it is done. I have six columns of slides in my 
Mvalue matrix. I create a weightmatrix of the same size and give the slides 
that I want to give "double" weight a 2 and slides that I want to give 
"normal" weight 1. Something like this:

weightmatrixp <- matrix(nrow = 64896, ncol = 6)
weightmatrixp[,1] <- 1
weightmatrixp[,2] <- 2
weightmatrixp[,3] <- 2
weightmatrixp[,4] <- 1
weightmatrixp[,5] <- 1
weightmatrixp[,6] <- 1

And then I use this in gls.series:

fitpTB <- gls.series(MpannusTB, design=designpTB, ndups=2, spacing=32448, 
correlation=corp$cor,weights= weightmatrixp)

Is this the correct way of using weights?

Best regards

/ Johan Lindberg





>X-Sender: smyth at imaphost.wehi.edu.au
>X-Mailer: QUALCOMM Windows Eudora Version 6.0.1.1
>Date: Thu, 01 Apr 2004 17:43:13 +1000
>To: Johan Lindberg <johanl at kiev.biotech.kth.se>
>From: Gordon Smyth <smyth at wehi.edu.au>
>Subject: Re: [BioC] Different levels of replicates and how to create a
>   correct targets file out of that.
>Cc: bioconductor at stat.math.ethz.ch
>
>Dear Johan,
>
>Now I've had a chance to read your email more thoroughly, I think you 
>actually have a clever approach.
>
>At 11:51 PM 30/03/2004, Johan Lindberg wrote:
>>Sorry, I forgot to have a subject on the mail I sent before.
>>
>>Hello everyone.
>>I would really appreciate some comments/hints/help with a pretty long 
>>question.
>>
>>I have an experiment consisting of 18 hybridizations. On the 30K cDNA 
>>arrays knee joint bioipsies (from different patients) before and after a 
>>certain treatment is hybridized. What I want to find out is the effect of 
>>the treatment, not the difference between the patients. The problem is 
>>how to deal with different levels of replicates and how to create a 
>>correct target file since I have no common reference?
>>This is how the experimental set-up looks like.
>>
>>Patient Hybridization   Cy3                                     Cy5
>>1               1A                      Biopsy 1 before 
>>treatment       Biopsy 1 after treatment
>>                 1B                      Biopsy 1 after 
>> treatment        Biopsy 1 before treatment
>>3               2A                      Biopsy 1 before 
>>treatment       Biopsy 1 after treatment
>>                 2B                      Biopsy 1 after 
>> treatment        Biopsy 1 before treatment
>>                 3A                      Biopsy 2 before 
>> treatment       Biopsy 2 after treatment
>>                 3B                      Biopsy 2 after 
>> treatment        Biopsy 2 before treatment
>>4               4A                      Biopsy 1 before 
>>treatment       Biopsy 1 after treatment
>>                 4B                      Biopsy 1 after 
>> treatment        Biopsy 1 before treatment
>>                 5A                      Biopsy 2 before 
>> treatment       Biopsy 2 after treatment
>>                 5B                      Biopsy 2 after 
>> treatment        Biopsy 2 before treatment
>>5               6A                      Biopsy 1 before 
>>treatment       Biopsy 1 after treatment
>>                 6B                      Biopsy 1 after 
>> treatment        Biopsy 1 before treatment
>>6               7A                      Biopsy 1 before 
>>treatment       Biopsy 1 after treatment
>>                 7B                      Biopsy 1 after 
>> treatment        Biopsy 1 before treatment
>>7               8A                      Biopsy 1 before 
>>treatment       Biopsy 1 after treatment
>>                 8B                      Biopsy 1 after 
>> treatment        Biopsy 1 before treatment
>>10              9A                      Biopsy 1 before 
>>treatment       Biopsy 1 after treatment
>>                 9B                      Biopsy 1 after 
>> treatment        Biopsy 1 before treatment
>
>You have an unbalanced design with three error strata: patient, biopsy, 
>microarray. In principle one would like to treat this using a model with 
>nested random effects but, as recent discussion has indicated, this is not 
>so straightforward.
>
>>As you can see different patients have one or two biopsies taken from 
>>them. Since I realize it would be a mistake to include all those into the 
>>target file because if I have more measurements of a certain patient that 
>>would bias the ranking of the B-stat towards the patient having the most 
>>biopsies in the end, right? Or?
>>Since the differentially expressed genes in the patient with more 
>>biopsies will get smaller variance?
>>
>>My solution to the problem was just to create an artificial Mmatrix twice 
>>as long as the original MA object. For the patients with two biopsies I 
>>averaged over the technical replicates (dye-swaps) and put the values 
>>from biopsy one and then the values from biopsy two in the matrix. From 
>>patients with just a technical replicate I put the values from 
>>hybridization 1A and then hybridization 1B into the matrix.
>>
>>The M-values of that matrix object would look something like:
>>
>>                         patient 
>> 1               patient3                                        ....
>>Rows 1-30000    Hybridization 1A        Average of hybridization 2A and 
>>2B      ....
>>Rows 30001-60000        Hybridization 1B        Average of hybridization 
>>3A and 3B      ....
>>
>>After this I plan to use dupcor on the new matrix of M-values, as if I 
>>would have a slide with replicate spots on it.
>>
>>So far so good or? Is this a good way of treating replicates on different 
>>levels or has anyone else some better idea of how to do this. Comments 
>>please.....
>
>This is actually very clever. You've got rid of one error strata by 
>averaging, then use duplicateCorrelation to handle the other. I think your 
>approach is actually a good one *but* you need to give double weight to 
>cases where you have averaged over two technical replicates. Use the 
>'weights' component of your MAList object to do this.
>
>>And now, how to create a correct targets file since I have no common 
>>reference.
>>I guess it would look something like this:
>>
>>SlideNumber     Name    FileName        Cy3     Cy5
>>1       pat1_p  test1.gpr       Before_p1       After_p1
>>2       pat3_p  test2.gpr       Before_p2       After_p2
>>3       pat4_p  test3.gpr       Before_p3       After_p3
>>4       pat6_p  test4.gpr       Before_p4       After_p4
>>5       pat7_p  test5.gpr       Before_p5       After_p5
>>6       pat10_p test6.gpr       Before_p6       After_p6
>>
>>But when I want to make my contrast matrix I am lost since I do not have 
>>anything to write as ref.
>>design <- modelMatrix(targets, ref="????????")
>
>If I have understood your approach, you don't need to do anything about 
>the targets file or the design matrix. Just use design <- rep(1,6). You 
>now have independent M-values estimating the same thing.
>
>Gordon
>
>>If I redo the matrix to
>>
>>SlideNumber     Name    FileName        Cy3     Cy5
>>1       pat1_p  test1.gpr       Before_p        After_p
>>2       pat3_p  test2.gpr       Before_p        After_p
>>3       pat4_p  test3.gpr       Before_p        After_p
>>4       pat6_p  test4.gpr       Before_p        After_p
>>5       pat7_p  test5.gpr       Before_p        After_p
>>6       pat10_p test6.gpr       Before_p        After_p
>>
>>wouldnt that be the same as treating this as a common reference design 
>>when it is not? And wouldnt that effect the variance of the experiment? 
>>How do I do this in a correct way.
>>I looked at the Zebra fish example in the LIMMA user guide but isnt that 
>>wrong as well. Because technical and biological replicates are treated 
>>the same way in the targets file of the zebra fish.
>>
>>I realize that many of these questions should have been considered before 
>>conducting the lab part but unfortunately they were not. So I will not be 
>>surprised if someone sends me the same quote as I got yesterday from a friend:
>>
>>"To consult a statistician after an experiment is finished is often 
>>merely to ask him to conduct a post mortem examination. He can perhaps 
>>say what the experiment died of."
>>- R.A. Fisher, Presidential Address to the First Indian Statistical 
>>Congress, 1938
>>
>>Best regards
>>
>>/Johan Lindberg



More information about the Bioconductor mailing list