[BioC] Different levels of replicates and how to create a correct
targets file out of that.
Johan Lindberg
johanl at kiev.biotech.kth.se
Tue Mar 30 15:51:44 CEST 2004
Sorry, I forgot to have a subject on the mail I sent before.
Hello everyone.
I would really appreciate some comments/hints/help with a pretty long question.
I have an experiment consisting of 18 hybridizations. On the 30K cDNA
arrays knee joint bioipsies (from different patients) before and after a
certain treatment is hybridized. What I want to find out is the effect of
the treatment, not the difference between the patients. The problem is how
to deal with different levels of replicates and how to create a correct
target file since I have no common reference?
This is how the experimental set-up looks like.
Patient Hybridization Cy3 Cy5
1 1A Biopsy 1 before
treatment Biopsy 1 after treatment
1B Biopsy 1 after
treatment Biopsy 1 before treatment
3 2A Biopsy 1 before
treatment Biopsy 1 after treatment
2B Biopsy 1 after
treatment Biopsy 1 before treatment
3A Biopsy 2 before
treatment Biopsy 2 after treatment
3B Biopsy 2 after
treatment Biopsy 2 before treatment
4 4A Biopsy 1 before
treatment Biopsy 1 after treatment
4B Biopsy 1 after
treatment Biopsy 1 before treatment
5A Biopsy 2 before
treatment Biopsy 2 after treatment
5B Biopsy 2 after
treatment Biopsy 2 before treatment
5 6A Biopsy 1 before
treatment Biopsy 1 after treatment
6B Biopsy 1 after
treatment Biopsy 1 before treatment
6 7A Biopsy 1 before
treatment Biopsy 1 after treatment
7B Biopsy 1 after
treatment Biopsy 1 before treatment
7 8A Biopsy 1 before
treatment Biopsy 1 after treatment
8B Biopsy 1 after
treatment Biopsy 1 before treatment
10 9A Biopsy 1 before
treatment Biopsy 1 after treatment
9B Biopsy 1 after
treatment Biopsy 1 before treatment
As you can see different patients have one or two biopsies taken from them.
Since I realize it would be a mistake to include all those into the target
file because if I have more measurements of a certain patient that would
bias the ranking of the B-stat towards the patient having the most biopsies
in the end, right? Or?
Since the differentially expressed genes in the patient with more biopsies
will get smaller variance?
My solution to the problem was just to create an artificial Mmatrix twice
as long as the original MA object. For the patients with two biopsies I
averaged over the technical replicates (dye-swaps) and put the values from
biopsy one and then the values from biopsy two in the matrix. From patients
with just a technical replicate I put the values from hybridization 1A and
then hybridization 1B into the matrix.
The M-values of that matrix object would look something like:
patient
1 patient3 ....
Rows 1-30000 Hybridization 1A Average of hybridization 2A and
2B ....
Rows 30001-60000 Hybridization 1B Average of hybridization 3A
and 3B ....
After this I plan to use dupcor on the new matrix of M-values, as if I
would have a slide with replicate spots on it.
So far so good or? Is this a good way of treating replicates on different
levels or has anyone else some better idea of how to do this. Comments
please.....
And now, how to create a correct targets file since I have no common reference.
I guess it would look something like this:
SlideNumber Name FileName Cy3 Cy5
1 pat1_p test1.gpr Before_p1 After_p1
2 pat3_p test2.gpr Before_p2 After_p2
3 pat4_p test3.gpr Before_p3 After_p3
4 pat6_p test4.gpr Before_p4 After_p4
5 pat7_p test5.gpr Before_p5 After_p5
6 pat10_p test6.gpr Before_p6 After_p6
But when I want to make my contrast matrix I am lost since I do not have
anything to write as ref.
design <- modelMatrix(targets, ref="????????")
If I redo the matrix to
SlideNumber Name FileName Cy3 Cy5
1 pat1_p test1.gpr Before_p After_p
2 pat3_p test2.gpr Before_p After_p
3 pat4_p test3.gpr Before_p After_p
4 pat6_p test4.gpr Before_p After_p
5 pat7_p test5.gpr Before_p After_p
6 pat10_p test6.gpr Before_p After_p
wouldnt that be the same as treating this as a common reference design when
it is not? And wouldnt that effect the variance of the experiment? How do I
do this in a correct way.
I looked at the Zebra fish example in the LIMMA user guide but isnt that
wrong as well. Because technical and biological replicates are treated the
same way in the targets file of the zebra fish.
I realize that many of these questions should have been considered before
conducting the lab part but unfortunately they were not. So I will not be
surprised if someone sends me the same quote as I got yesterday from a friend:
"To consult a statistician after an experiment is finished is often merely
to ask him to conduct a post mortem examination. He can perhaps say what
the experiment died of."
- R.A. Fisher, Presidential Address to the First Indian Statistical
Congress, 1938
Best regards
/Johan Lindberg
More information about the Bioconductor
mailing list