[BioC] Cross-comparison of independent intensities from different experiments (genepix) (sorry I don\'t know how to describe the problem better)

Fri Feb 3 15:09:33 CET 2012

Hi Susanne,

On 2/3/2012 7:53 AM, Susanne Gerber [guest] wrote:
> Dear all,
> could please anyone help me with the following problem:
>
> Experiments were performed using two color cDNA .gpr files (genepix).
> We have an experimental setup with two independent time series (each of it with 4 time-points (in the following T1 - T4).
>
> In the first time series Wildtype(WT) cells were stressed at time point zero with a certain drug and probes were taken at 4 time points afterwards.
> These probes were compared with the unstressed WT.
>
> In the second time series mutant-cells (MU) were treated identically and compared with the unstressed MU cell.
>
>
> Here is the target file
>
>> targets
>         FileName          Cy3                       Cy5
> 1  13754122.gpr      WT                       WT_stress_T1
> 2  13754112.gpr      WT_stress_T1       WT
> 3  14039687.gpr      WT                       WT_stress_T2
> 4  13754123.gpr      WT                       WT_stress_T2
> 5  13754109.gpr      WT                       WT_stress_T3
> 6  14039055.gpr      WT_stress_T3       WT
> 7  14004643.gpr      WT                       WT_stress_T4
> 8  14039058.gpr      WT_stress_T4       WT
> 9  14039688.gpr      MU                       MU_stress_T1
> 10 13754114.gpr     MU_stress_T1       MU
> 11 14039061.gpr     MU                       MU_stress_T2
> 12 14039059.gpr     MU_stress_T2       MU
> 13 13754124.gpr     MU                       MU_stress_T3
> 14 13754115.gpr     MU_stress_T3       MU
> 15 14039057.gpr     MU                       MU_stress_T4
> 16 14039056.gpr     MU_stress_T4       MU
>
> I was working a lot with these data and we had some very interesting results, however,  I am not able to solve the following problem:
>
> How can a make a comparison between
> a) MU and WT
> b) MU_stressed and WT

That's because this is an unsolvable problem with the data in hand. I 
assume that by 'two independent time series' you mean that these 
experiments were conducted at different times, perhaps in different 
labs, etc?

There are two problems here. First, depending on what you mean by 
'independent time series', a batch effect may have been introduced, 
which you will not be able to account for statistically. However, 
depending on the nature of the independence between these time series, 
you may be able to get away with assuming little or no batch effect. But 
you will have to make that assumption without really being able to test it.

The second problem is due to the fact that you never hybridized MU and 
WT samples on the same chip, which has introduced another untestable and 
unquantifiable 'chip' effect. You could hypothetically do a single 
channel analysis with these data, but any comparison between MU and WT 
would include both biological and technical variability, and you won't 
be able to say how much of either. Again, you can assume that the 
technical variability is small, but you won't really be able to say for 
sure if this assumption is true.

To a certain extent, both time series have to be independent, as MU and 
WT cells are different. So if 'independent time series' just means that 
the experimenter did the WT time series and then did the MU time series, 
that's a batch effect that people ignore all the time, and I don't see a 
need to repeat the experiment. But if the experimenter really wants to 
compare the MU and WT samples directly, they need to be hybridized to 
the same chips, preferably in one of these 'round-robin' type designs 
where you do things like

MU1 vs WT1
MU stressed1 vs WT2
MU stressed2 vs WT stressed1
MU2 vs WT stressed2

which tends to reduce variability for comparisons. There may be 
something about these types of design in the limma user's guide. The 
maanova package was designed specifically for this type of analysis, so 
you might look at that package as well; I assume there is a vignette 
that may have helpful insights. You could also look at some of Katie 
Kerr's papers (do a google scholar search for kerr anova microarray).

Best,

Jim
>
> A am not the experimenter and it is also not possible to repeat the experiment and produce a direct comparison.
>
> However, I think - even if it is not the most elegant way - there should be a way to make this comparison with the existing data.
>
> I was already thinking of simple "copy and past" the single channel intensities from the .gpr-files into a new matrix, but I guess this would cause a lot of problems concerning normalization steps.
> Perhaps the answer is very easy, - then sorry for bothering you - but I swear I was reading a lot (tutorials) but actually I even don't know what keywords to search (google) for this problem.
>
> What I do right now (after preprocessing) is:
> #
> #
> Average<- avedups(genes, ndups=2, spacing=1)
> Average$A[ is.na(Average$A) ]<- 0.0
> Average$M[ is.na(Average$M) ]<- 0.0
> #
> designWT<- modelMatrix(targets,ref="WT")
> designWT<- designWT[1:8,1:4]
> designWT
> designMU<- modelMatrix(targets,ref="MU")
> designMU<- designMU[9:16,6:9]
> designMU
>
> AverageWT<- Average[,1:8]
> AverageMU<- Average[,9:16]
> #
> fit_WT<- lmFit(AverageWT, designWT)
> fit_WT<- eBayes(fit_WT)
> topTable(fit_WT)
> fit_MU<- lmFit(AverageMU, designMU)
> fit_MU<- eBayes(fit_MU)
> topTable(fit_MU)
>
> #
> .... and further analysis and evaluation procedures
> #
>
>
> Please, what would be the best way to make the comparison
>
> a) MU_(T1-4) with WT as reference
> and
> b) MU_stressed (T1-4 )with WT as a reference  ?
>
> Thanks a lot in advance for the help !
> I would be so grateful if someone could give me an answer.
>
> Best regards,
> Susanne
>
>
>
>   -- output of sessionInfo():
>
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C/en_US.UTF-8/C/C/C/C
>
> attached base packages:
> [1] splines   tcltk     stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] MASS_7.3-14          calibrate_1.7        Heatplus_1.22.0      XML_3.4-3            annaffy_1.24.0       KEGG.db_2.5.0
>   [7] goProfiles_1.14.0    GO.db_2.5.0          annotate_1.30.1      yeast2.db_2.5.0      org.Sc.sgd.db_2.5.0  RSQLite_0.10.0
> [13] DBI_0.2-5            AnnotationDbi_1.14.1 statmod_1.4.14       vsn_3.20.0           arrayQuality_1.30.0  convert_1.28.0
> [19] affy_1.30.0          marray_1.30.0        limma_3.8.3          maSigPro_1.24.1      DynDoc_1.30.0        widgetTools_1.30.0
> [25] Biobase_2.12.2
>
> loaded via a namespace (and not attached):
>   [1] Mfuzz_2.10.0          RColorBrewer_1.0-5    affyio_1.20.0         grid_2.13.2           gridBase_0.4-4        hexbin_1.26.0
>   [7] lattice_0.19-33       preprocessCore_1.14.0 tkWidgets_1.30.0      tools_2.13.2          xtable_1.6-0
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues