Fwd: Re: [BioC] Different levels of replicates and how to
create a correct targets file out of that.
Johan Lindberg
johanl at kiev.biotech.kth.se
Fri Apr 2 11:09:33 CEST 2004
Hi All.
Thank you for a very helpful discussion. I have a followup question on a
remark from Gordon.
"I think your approach is actually a good one *but* you need to give double
weight to cases where you have averaged over two technical replicates. Use
the 'weights' component of your MAList object to do this."
I have never used weights in LIMMA and the helpfile of gls.series doesnt
tell in what range the weights should be. I tried to search the mail
archives after info regarding weights but I found only information on
spotweights when creating the RG object etc.
So this is how I think it is done. I have six columns of slides in my
Mvalue matrix. I create a weightmatrix of the same size and give the slides
that I want to give "double" weight a 2 and slides that I want to give
"normal" weight 1. Something like this:
weightmatrixp <- matrix(nrow = 64896, ncol = 6)
weightmatrixp[,1] <- 1
weightmatrixp[,2] <- 2
weightmatrixp[,3] <- 2
weightmatrixp[,4] <- 1
weightmatrixp[,5] <- 1
weightmatrixp[,6] <- 1
And then I use this in gls.series:
fitpTB <- gls.series(MpannusTB, design=designpTB, ndups=2, spacing=32448,
correlation=corp$cor,weights= weightmatrixp)
Is this the correct way of using weights?
Best regards
/ Johan Lindberg
>X-Sender: smyth at imaphost.wehi.edu.au
>X-Mailer: QUALCOMM Windows Eudora Version 6.0.1.1
>Date: Thu, 01 Apr 2004 17:43:13 +1000
>To: Johan Lindberg <johanl at kiev.biotech.kth.se>
>From: Gordon Smyth <smyth at wehi.edu.au>
>Subject: Re: [BioC] Different levels of replicates and how to create a
> correct targets file out of that.
>Cc: bioconductor at stat.math.ethz.ch
>
>Dear Johan,
>
>Now I've had a chance to read your email more thoroughly, I think you
>actually have a clever approach.
>
>At 11:51 PM 30/03/2004, Johan Lindberg wrote:
>>Sorry, I forgot to have a subject on the mail I sent before.
>>
>>Hello everyone.
>>I would really appreciate some comments/hints/help with a pretty long
>>question.
>>
>>I have an experiment consisting of 18 hybridizations. On the 30K cDNA
>>arrays knee joint bioipsies (from different patients) before and after a
>>certain treatment is hybridized. What I want to find out is the effect of
>>the treatment, not the difference between the patients. The problem is
>>how to deal with different levels of replicates and how to create a
>>correct target file since I have no common reference?
>>This is how the experimental set-up looks like.
>>
>>Patient Hybridization Cy3 Cy5
>>1 1A Biopsy 1 before
>>treatment Biopsy 1 after treatment
>> 1B Biopsy 1 after
>> treatment Biopsy 1 before treatment
>>3 2A Biopsy 1 before
>>treatment Biopsy 1 after treatment
>> 2B Biopsy 1 after
>> treatment Biopsy 1 before treatment
>> 3A Biopsy 2 before
>> treatment Biopsy 2 after treatment
>> 3B Biopsy 2 after
>> treatment Biopsy 2 before treatment
>>4 4A Biopsy 1 before
>>treatment Biopsy 1 after treatment
>> 4B Biopsy 1 after
>> treatment Biopsy 1 before treatment
>> 5A Biopsy 2 before
>> treatment Biopsy 2 after treatment
>> 5B Biopsy 2 after
>> treatment Biopsy 2 before treatment
>>5 6A Biopsy 1 before
>>treatment Biopsy 1 after treatment
>> 6B Biopsy 1 after
>> treatment Biopsy 1 before treatment
>>6 7A Biopsy 1 before
>>treatment Biopsy 1 after treatment
>> 7B Biopsy 1 after
>> treatment Biopsy 1 before treatment
>>7 8A Biopsy 1 before
>>treatment Biopsy 1 after treatment
>> 8B Biopsy 1 after
>> treatment Biopsy 1 before treatment
>>10 9A Biopsy 1 before
>>treatment Biopsy 1 after treatment
>> 9B Biopsy 1 after
>> treatment Biopsy 1 before treatment
>
>You have an unbalanced design with three error strata: patient, biopsy,
>microarray. In principle one would like to treat this using a model with
>nested random effects but, as recent discussion has indicated, this is not
>so straightforward.
>
>>As you can see different patients have one or two biopsies taken from
>>them. Since I realize it would be a mistake to include all those into the
>>target file because if I have more measurements of a certain patient that
>>would bias the ranking of the B-stat towards the patient having the most
>>biopsies in the end, right? Or?
>>Since the differentially expressed genes in the patient with more
>>biopsies will get smaller variance?
>>
>>My solution to the problem was just to create an artificial Mmatrix twice
>>as long as the original MA object. For the patients with two biopsies I
>>averaged over the technical replicates (dye-swaps) and put the values
>>from biopsy one and then the values from biopsy two in the matrix. From
>>patients with just a technical replicate I put the values from
>>hybridization 1A and then hybridization 1B into the matrix.
>>
>>The M-values of that matrix object would look something like:
>>
>> patient
>> 1 patient3 ....
>>Rows 1-30000 Hybridization 1A Average of hybridization 2A and
>>2B ....
>>Rows 30001-60000 Hybridization 1B Average of hybridization
>>3A and 3B ....
>>
>>After this I plan to use dupcor on the new matrix of M-values, as if I
>>would have a slide with replicate spots on it.
>>
>>So far so good or? Is this a good way of treating replicates on different
>>levels or has anyone else some better idea of how to do this. Comments
>>please.....
>
>This is actually very clever. You've got rid of one error strata by
>averaging, then use duplicateCorrelation to handle the other. I think your
>approach is actually a good one *but* you need to give double weight to
>cases where you have averaged over two technical replicates. Use the
>'weights' component of your MAList object to do this.
>
>>And now, how to create a correct targets file since I have no common
>>reference.
>>I guess it would look something like this:
>>
>>SlideNumber Name FileName Cy3 Cy5
>>1 pat1_p test1.gpr Before_p1 After_p1
>>2 pat3_p test2.gpr Before_p2 After_p2
>>3 pat4_p test3.gpr Before_p3 After_p3
>>4 pat6_p test4.gpr Before_p4 After_p4
>>5 pat7_p test5.gpr Before_p5 After_p5
>>6 pat10_p test6.gpr Before_p6 After_p6
>>
>>But when I want to make my contrast matrix I am lost since I do not have
>>anything to write as ref.
>>design <- modelMatrix(targets, ref="????????")
>
>If I have understood your approach, you don't need to do anything about
>the targets file or the design matrix. Just use design <- rep(1,6). You
>now have independent M-values estimating the same thing.
>
>Gordon
>
>>If I redo the matrix to
>>
>>SlideNumber Name FileName Cy3 Cy5
>>1 pat1_p test1.gpr Before_p After_p
>>2 pat3_p test2.gpr Before_p After_p
>>3 pat4_p test3.gpr Before_p After_p
>>4 pat6_p test4.gpr Before_p After_p
>>5 pat7_p test5.gpr Before_p After_p
>>6 pat10_p test6.gpr Before_p After_p
>>
>>wouldnt that be the same as treating this as a common reference design
>>when it is not? And wouldnt that effect the variance of the experiment?
>>How do I do this in a correct way.
>>I looked at the Zebra fish example in the LIMMA user guide but isnt that
>>wrong as well. Because technical and biological replicates are treated
>>the same way in the targets file of the zebra fish.
>>
>>I realize that many of these questions should have been considered before
>>conducting the lab part but unfortunately they were not. So I will not be
>>surprised if someone sends me the same quote as I got yesterday from a friend:
>>
>>"To consult a statistician after an experiment is finished is often
>>merely to ask him to conduct a post mortem examination. He can perhaps
>>say what the experiment died of."
>>- R.A. Fisher, Presidential Address to the First Indian Statistical
>>Congress, 1938
>>
>>Best regards
>>
>>/Johan Lindberg
More information about the Bioconductor
mailing list