[BioC] Different levels of replicates and how to create a correct targets file out of that.

Gordon Smyth smyth at wehi.edu.au
Thu Apr 1 09:43:13 CEST 2004


Dear Johan,

Now I've had a chance to read your email more thoroughly, I think you 
actually have a clever approach.

At 11:51 PM 30/03/2004, Johan Lindberg wrote:
>Sorry, I forgot to have a subject on the mail I sent before.
>
>Hello everyone.
>I would really appreciate some comments/hints/help with a pretty long 
>question.
>
>I have an experiment consisting of 18 hybridizations. On the 30K cDNA 
>arrays knee joint bioipsies (from different patients) before and after a 
>certain treatment is hybridized. What I want to find out is the effect of 
>the treatment, not the difference between the patients. The problem is how 
>to deal with different levels of replicates and how to create a correct 
>target file since I have no common reference?
>This is how the experimental set-up looks like.
>
>Patient Hybridization   Cy3                                     Cy5
>1               1A                      Biopsy 1 before 
>treatment       Biopsy 1 after treatment
>                 1B                      Biopsy 1 after 
> treatment        Biopsy 1 before treatment
>3               2A                      Biopsy 1 before 
>treatment       Biopsy 1 after treatment
>                 2B                      Biopsy 1 after 
> treatment        Biopsy 1 before treatment
>                 3A                      Biopsy 2 before 
> treatment       Biopsy 2 after treatment
>                 3B                      Biopsy 2 after 
> treatment        Biopsy 2 before treatment
>4               4A                      Biopsy 1 before 
>treatment       Biopsy 1 after treatment
>                 4B                      Biopsy 1 after 
> treatment        Biopsy 1 before treatment
>                 5A                      Biopsy 2 before 
> treatment       Biopsy 2 after treatment
>                 5B                      Biopsy 2 after 
> treatment        Biopsy 2 before treatment
>5               6A                      Biopsy 1 before 
>treatment       Biopsy 1 after treatment
>                 6B                      Biopsy 1 after 
> treatment        Biopsy 1 before treatment
>6               7A                      Biopsy 1 before 
>treatment       Biopsy 1 after treatment
>                 7B                      Biopsy 1 after 
> treatment        Biopsy 1 before treatment
>7               8A                      Biopsy 1 before 
>treatment       Biopsy 1 after treatment
>                 8B                      Biopsy 1 after 
> treatment        Biopsy 1 before treatment
>10              9A                      Biopsy 1 before 
>treatment       Biopsy 1 after treatment
>                 9B                      Biopsy 1 after 
> treatment        Biopsy 1 before treatment

You have an unbalanced design with three error strata: patient, biopsy, 
microarray. In principle one would like to treat this using a model with 
nested random effects but, as recent discussion has indicated, this is not 
so straightforward.

>As you can see different patients have one or two biopsies taken from 
>them. Since I realize it would be a mistake to include all those into the 
>target file because if I have more measurements of a certain patient that 
>would bias the ranking of the B-stat towards the patient having the most 
>biopsies in the end, right? Or?
>Since the differentially expressed genes in the patient with more biopsies 
>will get smaller variance?
>
>My solution to the problem was just to create an artificial Mmatrix twice 
>as long as the original MA object. For the patients with two biopsies I 
>averaged over the technical replicates (dye-swaps) and put the values from 
>biopsy one and then the values from biopsy two in the matrix. From 
>patients with just a technical replicate I put the values from 
>hybridization 1A and then hybridization 1B into the matrix.
>
>The M-values of that matrix object would look something like:
>
>                         patient 
> 1               patient3                                        ....
>Rows 1-30000    Hybridization 1A        Average of hybridization 2A and 
>2B      ....
>Rows 30001-60000        Hybridization 1B        Average of hybridization 
>3A and 3B      ....
>
>After this I plan to use dupcor on the new matrix of M-values, as if I 
>would have a slide with replicate spots on it.
>
>So far so good or? Is this a good way of treating replicates on different 
>levels or has anyone else some better idea of how to do this. Comments 
>please.....

This is actually very clever. You've got rid of one error strata by 
averaging, then use duplicateCorrelation to handle the other. I think your 
approach is actually a good one *but* you need to give double weight to 
cases where you have averaged over two technical replicates. Use the 
'weights' component of your MAList object to do this.

>And now, how to create a correct targets file since I have no common 
>reference.
>I guess it would look something like this:
>
>SlideNumber     Name    FileName        Cy3     Cy5
>1       pat1_p  test1.gpr       Before_p1       After_p1
>2       pat3_p  test2.gpr       Before_p2       After_p2
>3       pat4_p  test3.gpr       Before_p3       After_p3
>4       pat6_p  test4.gpr       Before_p4       After_p4
>5       pat7_p  test5.gpr       Before_p5       After_p5
>6       pat10_p test6.gpr       Before_p6       After_p6
>
>But when I want to make my contrast matrix I am lost since I do not have 
>anything to write as ref.
>design <- modelMatrix(targets, ref="????????")

If I have understood your approach, you don't need to do anything about the 
targets file or the design matrix. Just use design <- rep(1,6). You now 
have independent M-values estimating the same thing.

Gordon

>If I redo the matrix to
>
>SlideNumber     Name    FileName        Cy3     Cy5
>1       pat1_p  test1.gpr       Before_p        After_p
>2       pat3_p  test2.gpr       Before_p        After_p
>3       pat4_p  test3.gpr       Before_p        After_p
>4       pat6_p  test4.gpr       Before_p        After_p
>5       pat7_p  test5.gpr       Before_p        After_p
>6       pat10_p test6.gpr       Before_p        After_p
>
>wouldnt that be the same as treating this as a common reference design 
>when it is not? And wouldnt that effect the variance of the experiment? 
>How do I do this in a correct way.
>I looked at the Zebra fish example in the LIMMA user guide but isnt that 
>wrong as well. Because technical and biological replicates are treated the 
>same way in the targets file of the zebra fish.
>
>I realize that many of these questions should have been considered before 
>conducting the lab part but unfortunately they were not. So I will not be 
>surprised if someone sends me the same quote as I got yesterday from a friend:
>
>"To consult a statistician after an experiment is finished is often merely 
>to ask him to conduct a post mortem examination. He can perhaps say what 
>the experiment died of."
>- R.A. Fisher, Presidential Address to the First Indian Statistical 
>Congress, 1938
>
>Best regards
>
>/Johan Lindberg



More information about the Bioconductor mailing list