[BioC] problems with paired design in limma

James W. MacDonald jmacdon at med.umich.edu
Wed Nov 26 16:55:58 CET 2008


Hi Mike,

That's what I would do.

Best,

Jim



Michael Walter wrote:
> Hi Jim,
> 
> I perfectly agree with you that I must not block the patients when I
> want to compare MSA vs Controls. For these comparisons I fitted a
> model without the patients and this worked fine. What we also want to
> see is the difference between the different regions in the different
> diseases, e.g. Cerebellum vs Cortex in the patients having MSA. Here
> I'd like to match the samples according to the donor. Can I
> alternatively try to fit three independent models for each disease
> instead of putting all together in one model?
> 
> Best Regards,
> 
> Mike
> 
> 
>> Hi Mike,
>> 
>> Michael Walter wrote:
>>> Dear List,
>>> 
>>> This one of the hundreds of "how do I create a design matrix in
>>> limma question". However, I have difficulties in setting up a
>>> paired design, with some error messages I really do not
>>> understand. The experiment consists of 27 U133A arrays from 9
>>> patients with 3 different conditions (2 diseases plus healthy
>>> controls). From each patient we have 3 different brain regions. I
>>> want to compare the difference between the brain regions in the
>>> different diseases. therefore I want to match the samples from
>>> the individual patients. I attached the code below. When I try to
>>> fit the model with lmFit I get following error message:
>>> 
>>>> fit <- lmFit(data.norm, design)
>>> Coefficients not estimable: sample_881 sample_936 Warning
>>> message: In lmFit(data.norm, design) : Some coefficients not
>>> estimable: coefficient interpretation may vary.
>>> 
>>> What I dont understand is why can I calculate the coefficients
>>> for all but 2 samples? I allready doublechecked my target file
>>> and design matrix and can't find any clue what might be wrong
>>> with these two samples, so any hint is highly appreciated.
>> There is nothing wrong with these samples per se. The problem
>> arises from the fact that you are trying to compute estimates for
>> too many parameters, so lmFit() is informing you of this problem.
>> 
>> When you are fitting a linear model, in essence what you are doing
>> is solving equations for multiple unknown quantities. Algebraically
>> you need one equation (or set of data) per unknown quantity. So for
>>  instance, you can solve for x with one equation, but you can't
>> solve for x and y with one equation, you need two.
>> 
>> However, you can solve for some combination of x and y with just
>> one equation:
>> 
>> x - y + 4 = 25 => x - y = 21
>> 
>> So what is happening is that one or more of your coefficients may
>> be the difference between two parameter estimates, rather than the
>> estimate of a single parameter. Which is what the 'coefficient
>> interpretation may vary' is hinting at.
>> 
>> I don't think you want to block these data on patient anyway. It
>> seems to me that you have patients with various diseases from whom
>> you have sampled brain tissue from various regions of the brain. So
>> if you want to e.g., compare the expression of genes in the
>> cerebellum of people with MSA to Co, then there is no blocking to
>> be done because people either have MSA or Co, but nobody has both.
>> 
>> Best,
>> 
>> Jim
>> 
>>> Best Regards,
>>> 
>>> Mike
>>> 
>>> 
>>> 
>>> Here is the code I used:
>>> 
>>>> target
>>> File disease patient region 1 "Cbm 628 U133A.CEL" PD 628
>>> Cerebellum 2 "Cbm 631 U133A.CEL" MSA 631 Cerebellum 3 "Cbm 650
>>> U133A.CEL" PD 650 Cerebellum 4 "Cbm 755 U133A.CEL" PD 755
>>> Cerebellum 5 "Cbm 758 U133A.CEL" Co 758 Cerebellum 6 "Cbm 769
>>> U133A.CEL" MSA 769 Cerebellum 7 "Cbm 776 U133A.CEL" MSA 776
>>> Cerebellum 8 "Cbm 881 U133A.CEL" MSA 881 Cerebellum 9 "Cbm 936
>>> U133A.CEL" Co 936 Cerebellum 10 "E4R_042a12b.CEL" Co 936 Cortex
>>> 11 "I4R_012a1.CEL" PD 628 Cortex 12 "I4R_012a11.CEL" MSA 881
>>> Cortex 13 "I4R_012a2.CEL" MSA 631 Cortex 14 "I4R_012a3.CEL" PD
>>> 650 Cortex 15 "I4R_012a6.CEL" PD 755 Cortex 16 "I4R_012a7.CEL" Co
>>> 758 Cortex 17 "I4R_012a8.CEL" MSA 769 Cortex 18 "I4R_012a9.CEL"
>>> MSA 776 Cortex 19 "pn0628_133a.CEL" PD 628 Putamen 20 
>>> "pn0631_133a.CEL" MSA 631 Putamen 21 "pn0650_133a.CEL" PD 650
>>> Putamen 22 "pn0755_133a.CEL" PD 755 Putamen 23 "pn0758_133a.CEL"
>>> Co 758 Putamen 24 "pn0769_133a.CEL" MSA 769 Putamen 25
>>> "pn0776_133a.CEL" MSA 776 Putamen 26 "pn0881_133a.CEL" MSA 881
>>> Putamen 27 "pn0936_133a.CEL" Co 936 Putamen
>>> 
>>>> condition <- as.factor(paste(disease, rep(c("Cbm", "Cor",
>>>> "Ptm"), each=9), sep=".")) sample <- as.factor(paste("_",
>>>> patient, sep=""))
>>>> 
>>>> 
>>>> design <- model.matrix(~0+condition+sample)
>>>> colnames(design)[1:9] <- sort(as.character(unique(condition)))
>>>> fit <- lmFit(data.norm, design)
>>> Coefficients not estimable: sample_881 sample_936 Warning
>>> message: In lmFit(data.norm, design) : Some coefficients not
>>> estimable: coefficient interpretation may vary.
>>>> sessionInfo()
>>> R version 2.7.0 (2008-04-22) i386-pc-mingw32
>>> 
>>> locale: 
>>> LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
>>> 
>>> 
>>> 
>>> attached base packages: [1] tools stats graphics grDevices utils 
>>> datasets methods [8] base
>>> 
>>> other attached packages: [1] affy_1.18.2 preprocessCore_1.2.0 
>>> affyio_1.8.0 [4] Biobase_2.0.1 limma_2.14.5
>>> 
>>> loaded via a namespace (and not attached): [1]
>>> scatterplot3d_0.3-27
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> 
>>> 
>>> _______________________________________________ Bioconductor
>>> mailing list Bioconductor at stat.math.ethz.ch 
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the 
>>> archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>> -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D
>> MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 
>> 734-936-8662
>> 
>> _______________________________________________ Bioconductor
>> mailing list Bioconductor at stat.math.ethz.ch 
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
> 
> -- Dr. Michael Walter
> 
> The Microarray Facility University of Tuebingen Calwerstr. 7 72076
> Tübingen/GERMANY
> 
> Tel.: +49 (0) 7071 29 83210 Fax. + 49 (0) 7071 29 5228
> 
> Confidentiality Note: This message is intended only for the use of
> the named recipient(s) and may contain confidential and/or
> proprietary information. If you are not the intended recipient,
> please contact the sender and delete the message. Any unauthorized 
> use of the information contained in this message is prohibited

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662



More information about the Bioconductor mailing list