[BioC] limma advice required. Investigating the amplification of small sample RNA

Wed Jul 26 09:20:17 CEST 2006

Dear limma experts

I am analyzing the data set given to me and described by these the 
column names of my MA object:

 > colnames(MA.hyp)
[1] "../ampcon/mev/C0-_1stround_vs_C60-_1stround_13263536.mev"
[2] "../ampcon/mev/C0-_2ndround_vs_C60-_2ndround_13263534.mev"
[3] "../ampcon/mev/C60-_1stround_vs_C0-_1stround_13263533.mev"
[4] "../ampcon/mev/C60-_2ndround_vs_C0-_2ndround_13263531.mev"
[5] "../licl/mev/C0-_vs_C60-_13260944.mev"
[6] "../licl/mev/C60-_vs_C0-_13260945.mev"

Slide 1 has RNA from sample C0- and C60- after 1 round of linear
amplification. Slide 3 is the corresponding dyeswap.

 Slide 2 has RNA from from sample C0- and C60 after 2 rounds of linear
amplification. Slide 4 is the corresponding dyeswap.

 Slide 5 has total RNA from sample C0- and C60- (i.e. now amplification).
Slide 6 is the corresponding dye-swap.
Aims:
------
My motivation is to see if using amplified RNA alters the log ratios in comparison to total RNA. I am following a paper by Nygaard et al 2003 entitled "Obtaining reliable information from minute amounts of RNA
using cDNA microarrays" BMC Genomics 2002(3). In this paper they
performed two investigations:

 a) Multiple hypothesis testing of log ratios to identify those genes
whose log ratios were significantly different if amplified RNA was used instead of total RNA

 b) Mixed effects modelling to quantify noise terms. 

I hope to do both but my initial problem concerns a). Initially I have
set up a design matrix with three protocol effects:
> hyp.design
  round_1 round_2 non_amp
1       1       0       0
2       0       1       0
3      -1       0       0
4       0      -1       0
5       0       0       1
6       0       0      -1
I fit the model thus:
>fit.hyp<-lmFit(MA.hyp,design=hyp.design)

 Which gives me estimates of the protocol effect. I would like to perform
a t-test **CAVEAT APPROACHING!** I realize that this is not the best way
to perform the analysis due to the inherent problems with ordinary t-
statistics, but my adviser would like to see how the analysis compares
with the Nygaard paper. So specific questions relating to problem a)
are:

 1) How do I carry out the t-test on a per-gene basis given the mean
protocol effect available from the fit object. I can see that the limma
guide has a way of obtaining the t-statistics but I'm not really sure
how to do the test on a per gene basis. I guess it's typical

 2) I look at the stdev.unscaled and it is the same for each protocol. Is
this to be expected? Sorry, my stats knowledge is not great.  

 round_1   round_2   non_amp
0.7071068 0.7071068 0.7071068

 3) What is the difference between sigma and stdev.unscaled?

 In addition, I thought that modelling as separate channels might be more
applicable i.e create a design matrix like this:
> design.sc
   C0.1 C60.1 C0.2 C60.2 C0.NA C60.NA
1     1     0    0     0     0      0
2     0     1    0     0     0      0
3     0     0    1     0     0      0
4     0     0    0     1     0      0
5     0     1    0     0     0      0
6     1     0    0     0     0      0
7     0     1    0     0     0      0
8     0     0    1     0     0      0
9     0     0    0     0     1      0
10    0     0    0     0     0      1
11    0     0    0     0     0      1
12    0     0    0     0     1      0

 And then fitting the contrasts such as C0.1-C0.NA and using the
moderated t-statistics to test. I realize that this would not test ratio
preservation, but would provide a measure of protocol dependent effects
on each channel. I'd appreciate any thoughts.

 Kind regards

 Andy