[BioC] (no subject)

Johan Lindberg johanl at kiev.biotech.kth.se
Tue Mar 30 15:48:38 CEST 2004

Hello everyone.
I would really appreciate some comments/hints/help with a pretty long question.

I have an experiment consisting of 18 hybridizations. On the 30K cDNA 
arrays knee joint bioipsies (from different patients) before and after a 
certain treatment is hybridized. What I want to find out is the effect of 
the treatment, not the difference between the patients. The problem is how 
to deal with different levels of replicates and how to create a correct 
target file since I have no common reference?
This is how the experimental set-up looks like.

Patient Hybridization   Cy3                                     Cy5
1               1A                      Biopsy 1 before 
treatment       Biopsy 1 after treatment
                 1B                      Biopsy 1 after 
treatment        Biopsy 1 before treatment
3               2A                      Biopsy 1 before 
treatment       Biopsy 1 after treatment
                 2B                      Biopsy 1 after 
treatment        Biopsy 1 before treatment
                 3A                      Biopsy 2 before 
treatment       Biopsy 2 after treatment
                 3B                      Biopsy 2 after 
treatment        Biopsy 2 before treatment
4               4A                      Biopsy 1 before 
treatment       Biopsy 1 after treatment
                 4B                      Biopsy 1 after 
treatment        Biopsy 1 before treatment
                 5A                      Biopsy 2 before 
treatment       Biopsy 2 after treatment
                 5B                      Biopsy 2 after 
treatment        Biopsy 2 before treatment
5               6A                      Biopsy 1 before 
treatment       Biopsy 1 after treatment
                 6B                      Biopsy 1 after 
treatment        Biopsy 1 before treatment
6               7A                      Biopsy 1 before 
treatment       Biopsy 1 after treatment
                 7B                      Biopsy 1 after 
treatment        Biopsy 1 before treatment
7               8A                      Biopsy 1 before 
treatment       Biopsy 1 after treatment
                 8B                      Biopsy 1 after 
treatment        Biopsy 1 before treatment
10              9A                      Biopsy 1 before 
treatment       Biopsy 1 after treatment
                 9B                      Biopsy 1 after 
treatment        Biopsy 1 before treatment

As you can see different patients have one or two biopsies taken from them. 
Since I realize it would be a mistake to include all those into the target 
file because if I have more measurements of a certain patient that would 
bias the ranking of the B-stat towards the patient having the most biopsies 
in the end, right? Or?
Since the differentially expressed genes in the patient with more biopsies 
will get smaller variance?

My solution to the problem was just to create an artificial Mmatrix twice 
as long as the original MA object. For the patients with two biopsies I 
averaged over the technical replicates (dye-swaps) and put the values from 
biopsy one and then the values from biopsy two in the matrix. From patients 
with just a technical replicate I put the values from hybridization 1A and 
then hybridization 1B into the matrix.

The M-values of that matrix object would look something like:

1               patient3                                        ....
Rows 1-30000    Hybridization 1A        Average of hybridization 2A and 
2B      ....
Rows 30001-60000        Hybridization 1B        Average of hybridization 3A 
and 3B      ....

After this I plan to use dupcor on the new matrix of M-values, as if I 
would have a slide with replicate spots on it.

So far so good or? Is this a good way of treating replicates on different 
levels or has anyone else some better idea of how to do this. Comments 

And now, how to create a correct targets file since I have no common reference.
I guess it would look something like this:

SlideNumber     Name    FileName        Cy3     Cy5
1       pat1_p  test1.gpr       Before_p1       After_p1
2       pat3_p  test2.gpr       Before_p2       After_p2
3       pat4_p  test3.gpr       Before_p3       After_p3
4       pat6_p  test4.gpr       Before_p4       After_p4
5       pat7_p  test5.gpr       Before_p5       After_p5
6       pat10_p test6.gpr       Before_p6       After_p6

But when I want to make my contrast matrix I am lost since I do not have 
anything to write as ref.
design <- modelMatrix(targets, ref="????????")

If I redo the matrix to

SlideNumber     Name    FileName        Cy3     Cy5
1       pat1_p  test1.gpr       Before_p        After_p
2       pat3_p  test2.gpr       Before_p        After_p
3       pat4_p  test3.gpr       Before_p        After_p
4       pat6_p  test4.gpr       Before_p        After_p
5       pat7_p  test5.gpr       Before_p        After_p
6       pat10_p test6.gpr       Before_p        After_p

wouldnt that be the same as treating this as a common reference design when 
it is not? And wouldnt that effect the variance of the experiment? How do I 
do this in a correct way.
I looked at the Zebra fish example in the LIMMA user guide but isnt that 
wrong as well. Because technical and biological replicates are treated the 
same way in the targets file of the zebra fish.

I realize that many of these questions should have been considered before 
conducting the lab part but unfortunately they were not. So I will not be 
surprised if someone sends me the same quote as I got yesterday from a friend:

"To consult a statistician after an experiment is finished is often merely 
to ask him to conduct a post mortem examination. He can perhaps say what 
the experiment died of."
- R.A. Fisher, Presidential Address to the First Indian Statistical 
Congress, 1938

Best regards

/Johan Lindberg

More information about the Bioconductor mailing list