[BioC] problems with paired design in limma
James W. MacDonald
jmacdon at med.umich.edu
Wed Nov 26 16:55:58 CET 2008
Hi Mike,
That's what I would do.
Best,
Jim
Michael Walter wrote:
> Hi Jim,
>
> I perfectly agree with you that I must not block the patients when I
> want to compare MSA vs Controls. For these comparisons I fitted a
> model without the patients and this worked fine. What we also want to
> see is the difference between the different regions in the different
> diseases, e.g. Cerebellum vs Cortex in the patients having MSA. Here
> I'd like to match the samples according to the donor. Can I
> alternatively try to fit three independent models for each disease
> instead of putting all together in one model?
>
> Best Regards,
>
> Mike
>
>
>> Hi Mike,
>>
>> Michael Walter wrote:
>>> Dear List,
>>>
>>> This one of the hundreds of "how do I create a design matrix in
>>> limma question". However, I have difficulties in setting up a
>>> paired design, with some error messages I really do not
>>> understand. The experiment consists of 27 U133A arrays from 9
>>> patients with 3 different conditions (2 diseases plus healthy
>>> controls). From each patient we have 3 different brain regions. I
>>> want to compare the difference between the brain regions in the
>>> different diseases. therefore I want to match the samples from
>>> the individual patients. I attached the code below. When I try to
>>> fit the model with lmFit I get following error message:
>>>
>>>> fit <- lmFit(data.norm, design)
>>> Coefficients not estimable: sample_881 sample_936 Warning
>>> message: In lmFit(data.norm, design) : Some coefficients not
>>> estimable: coefficient interpretation may vary.
>>>
>>> What I dont understand is why can I calculate the coefficients
>>> for all but 2 samples? I allready doublechecked my target file
>>> and design matrix and can't find any clue what might be wrong
>>> with these two samples, so any hint is highly appreciated.
>> There is nothing wrong with these samples per se. The problem
>> arises from the fact that you are trying to compute estimates for
>> too many parameters, so lmFit() is informing you of this problem.
>>
>> When you are fitting a linear model, in essence what you are doing
>> is solving equations for multiple unknown quantities. Algebraically
>> you need one equation (or set of data) per unknown quantity. So for
>> instance, you can solve for x with one equation, but you can't
>> solve for x and y with one equation, you need two.
>>
>> However, you can solve for some combination of x and y with just
>> one equation:
>>
>> x - y + 4 = 25 => x - y = 21
>>
>> So what is happening is that one or more of your coefficients may
>> be the difference between two parameter estimates, rather than the
>> estimate of a single parameter. Which is what the 'coefficient
>> interpretation may vary' is hinting at.
>>
>> I don't think you want to block these data on patient anyway. It
>> seems to me that you have patients with various diseases from whom
>> you have sampled brain tissue from various regions of the brain. So
>> if you want to e.g., compare the expression of genes in the
>> cerebellum of people with MSA to Co, then there is no blocking to
>> be done because people either have MSA or Co, but nobody has both.
>>
>> Best,
>>
>> Jim
>>
>>> Best Regards,
>>>
>>> Mike
>>>
>>>
>>>
>>> Here is the code I used:
>>>
>>>> target
>>> File disease patient region 1 "Cbm 628 U133A.CEL" PD 628
>>> Cerebellum 2 "Cbm 631 U133A.CEL" MSA 631 Cerebellum 3 "Cbm 650
>>> U133A.CEL" PD 650 Cerebellum 4 "Cbm 755 U133A.CEL" PD 755
>>> Cerebellum 5 "Cbm 758 U133A.CEL" Co 758 Cerebellum 6 "Cbm 769
>>> U133A.CEL" MSA 769 Cerebellum 7 "Cbm 776 U133A.CEL" MSA 776
>>> Cerebellum 8 "Cbm 881 U133A.CEL" MSA 881 Cerebellum 9 "Cbm 936
>>> U133A.CEL" Co 936 Cerebellum 10 "E4R_042a12b.CEL" Co 936 Cortex
>>> 11 "I4R_012a1.CEL" PD 628 Cortex 12 "I4R_012a11.CEL" MSA 881
>>> Cortex 13 "I4R_012a2.CEL" MSA 631 Cortex 14 "I4R_012a3.CEL" PD
>>> 650 Cortex 15 "I4R_012a6.CEL" PD 755 Cortex 16 "I4R_012a7.CEL" Co
>>> 758 Cortex 17 "I4R_012a8.CEL" MSA 769 Cortex 18 "I4R_012a9.CEL"
>>> MSA 776 Cortex 19 "pn0628_133a.CEL" PD 628 Putamen 20
>>> "pn0631_133a.CEL" MSA 631 Putamen 21 "pn0650_133a.CEL" PD 650
>>> Putamen 22 "pn0755_133a.CEL" PD 755 Putamen 23 "pn0758_133a.CEL"
>>> Co 758 Putamen 24 "pn0769_133a.CEL" MSA 769 Putamen 25
>>> "pn0776_133a.CEL" MSA 776 Putamen 26 "pn0881_133a.CEL" MSA 881
>>> Putamen 27 "pn0936_133a.CEL" Co 936 Putamen
>>>
>>>> condition <- as.factor(paste(disease, rep(c("Cbm", "Cor",
>>>> "Ptm"), each=9), sep=".")) sample <- as.factor(paste("_",
>>>> patient, sep=""))
>>>>
>>>>
>>>> design <- model.matrix(~0+condition+sample)
>>>> colnames(design)[1:9] <- sort(as.character(unique(condition)))
>>>> fit <- lmFit(data.norm, design)
>>> Coefficients not estimable: sample_881 sample_936 Warning
>>> message: In lmFit(data.norm, design) : Some coefficients not
>>> estimable: coefficient interpretation may vary.
>>>> sessionInfo()
>>> R version 2.7.0 (2008-04-22) i386-pc-mingw32
>>>
>>> locale:
>>> LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
>>>
>>>
>>>
>>> attached base packages: [1] tools stats graphics grDevices utils
>>> datasets methods [8] base
>>>
>>> other attached packages: [1] affy_1.18.2 preprocessCore_1.2.0
>>> affyio_1.8.0 [4] Biobase_2.0.1 limma_2.14.5
>>>
>>> loaded via a namespace (and not attached): [1]
>>> scatterplot3d_0.3-27
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>> _______________________________________________ Bioconductor
>>> mailing list Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>> archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>> -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D
>> MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646
>> 734-936-8662
>>
>> _______________________________________________ Bioconductor
>> mailing list Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> -- Dr. Michael Walter
>
> The Microarray Facility University of Tuebingen Calwerstr. 7 72076
> Tübingen/GERMANY
>
> Tel.: +49 (0) 7071 29 83210 Fax. + 49 (0) 7071 29 5228
>
> Confidentiality Note: This message is intended only for the use of
> the named recipient(s) and may contain confidential and/or
> proprietary information. If you are not the intended recipient,
> please contact the sender and delete the message. Any unauthorized
> use of the information contained in this message is prohibited
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
More information about the Bioconductor
mailing list