[BioC] Cross-comparison of independent intensities from different experiments (genepix) (sorry I don\'t know how to describe the problem better)
James W. MacDonald
jmacdon at med.umich.edu
Fri Feb 3 15:09:33 CET 2012
Hi Susanne,
On 2/3/2012 7:53 AM, Susanne Gerber [guest] wrote:
> Dear all,
> could please anyone help me with the following problem:
>
> Experiments were performed using two color cDNA .gpr files (genepix).
> We have an experimental setup with two independent time series (each of it with 4 time-points (in the following T1 - T4).
>
> In the first time series Wildtype(WT) cells were stressed at time point zero with a certain drug and probes were taken at 4 time points afterwards.
> These probes were compared with the unstressed WT.
>
> In the second time series mutant-cells (MU) were treated identically and compared with the unstressed MU cell.
>
>
> Here is the target file
>
>> targets
> FileName Cy3 Cy5
> 1 13754122.gpr WT WT_stress_T1
> 2 13754112.gpr WT_stress_T1 WT
> 3 14039687.gpr WT WT_stress_T2
> 4 13754123.gpr WT WT_stress_T2
> 5 13754109.gpr WT WT_stress_T3
> 6 14039055.gpr WT_stress_T3 WT
> 7 14004643.gpr WT WT_stress_T4
> 8 14039058.gpr WT_stress_T4 WT
> 9 14039688.gpr MU MU_stress_T1
> 10 13754114.gpr MU_stress_T1 MU
> 11 14039061.gpr MU MU_stress_T2
> 12 14039059.gpr MU_stress_T2 MU
> 13 13754124.gpr MU MU_stress_T3
> 14 13754115.gpr MU_stress_T3 MU
> 15 14039057.gpr MU MU_stress_T4
> 16 14039056.gpr MU_stress_T4 MU
>
> I was working a lot with these data and we had some very interesting results, however, I am not able to solve the following problem:
>
> How can a make a comparison between
> a) MU and WT
> b) MU_stressed and WT
That's because this is an unsolvable problem with the data in hand. I
assume that by 'two independent time series' you mean that these
experiments were conducted at different times, perhaps in different
labs, etc?
There are two problems here. First, depending on what you mean by
'independent time series', a batch effect may have been introduced,
which you will not be able to account for statistically. However,
depending on the nature of the independence between these time series,
you may be able to get away with assuming little or no batch effect. But
you will have to make that assumption without really being able to test it.
The second problem is due to the fact that you never hybridized MU and
WT samples on the same chip, which has introduced another untestable and
unquantifiable 'chip' effect. You could hypothetically do a single
channel analysis with these data, but any comparison between MU and WT
would include both biological and technical variability, and you won't
be able to say how much of either. Again, you can assume that the
technical variability is small, but you won't really be able to say for
sure if this assumption is true.
To a certain extent, both time series have to be independent, as MU and
WT cells are different. So if 'independent time series' just means that
the experimenter did the WT time series and then did the MU time series,
that's a batch effect that people ignore all the time, and I don't see a
need to repeat the experiment. But if the experimenter really wants to
compare the MU and WT samples directly, they need to be hybridized to
the same chips, preferably in one of these 'round-robin' type designs
where you do things like
MU1 vs WT1
MU stressed1 vs WT2
MU stressed2 vs WT stressed1
MU2 vs WT stressed2
which tends to reduce variability for comparisons. There may be
something about these types of design in the limma user's guide. The
maanova package was designed specifically for this type of analysis, so
you might look at that package as well; I assume there is a vignette
that may have helpful insights. You could also look at some of Katie
Kerr's papers (do a google scholar search for kerr anova microarray).
Best,
Jim
>
> A am not the experimenter and it is also not possible to repeat the experiment and produce a direct comparison.
>
> However, I think - even if it is not the most elegant way - there should be a way to make this comparison with the existing data.
>
> I was already thinking of simple "copy and past" the single channel intensities from the .gpr-files into a new matrix, but I guess this would cause a lot of problems concerning normalization steps.
> Perhaps the answer is very easy, - then sorry for bothering you - but I swear I was reading a lot (tutorials) but actually I even don't know what keywords to search (google) for this problem.
>
> What I do right now (after preprocessing) is:
> #
> #
> Average<- avedups(genes, ndups=2, spacing=1)
> Average$A[ is.na(Average$A) ]<- 0.0
> Average$M[ is.na(Average$M) ]<- 0.0
> #
> designWT<- modelMatrix(targets,ref="WT")
> designWT<- designWT[1:8,1:4]
> designWT
> designMU<- modelMatrix(targets,ref="MU")
> designMU<- designMU[9:16,6:9]
> designMU
>
> AverageWT<- Average[,1:8]
> AverageMU<- Average[,9:16]
> #
> fit_WT<- lmFit(AverageWT, designWT)
> fit_WT<- eBayes(fit_WT)
> topTable(fit_WT)
> fit_MU<- lmFit(AverageMU, designMU)
> fit_MU<- eBayes(fit_MU)
> topTable(fit_MU)
>
> #
> .... and further analysis and evaluation procedures
> #
>
>
> Please, what would be the best way to make the comparison
>
> a) MU_(T1-4) with WT as reference
> and
> b) MU_stressed (T1-4 )with WT as a reference ?
>
> Thanks a lot in advance for the help !
> I would be so grateful if someone could give me an answer.
>
> Best regards,
> Susanne
>
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C/en_US.UTF-8/C/C/C/C
>
> attached base packages:
> [1] splines tcltk stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] MASS_7.3-14 calibrate_1.7 Heatplus_1.22.0 XML_3.4-3 annaffy_1.24.0 KEGG.db_2.5.0
> [7] goProfiles_1.14.0 GO.db_2.5.0 annotate_1.30.1 yeast2.db_2.5.0 org.Sc.sgd.db_2.5.0 RSQLite_0.10.0
> [13] DBI_0.2-5 AnnotationDbi_1.14.1 statmod_1.4.14 vsn_3.20.0 arrayQuality_1.30.0 convert_1.28.0
> [19] affy_1.30.0 marray_1.30.0 limma_3.8.3 maSigPro_1.24.1 DynDoc_1.30.0 widgetTools_1.30.0
> [25] Biobase_2.12.2
>
> loaded via a namespace (and not attached):
> [1] Mfuzz_2.10.0 RColorBrewer_1.0-5 affyio_1.20.0 grid_2.13.2 gridBase_0.4-4 hexbin_1.26.0
> [7] lattice_0.19-33 preprocessCore_1.14.0 tkWidgets_1.30.0 tools_2.13.2 xtable_1.6-0
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list