[BioC] limma modeling, paired samples and continuous variable

Gordon K Smyth smyth at wehi.EDU.AU
Sun Apr 20 03:15:30 CEST 2014


On Sat, 19 Apr 2014, Riba Michela wrote:

> Hi Professor Gordon,
> thanks for you answer.
>
> I just want to add some observations:
> -about the factor
> I actually declared as a factor,
> but afterwards I used another :
>
>>> r<-target$Condition #this should be numeric
>
> where actually I returned back to the target file and extracted the column of the numeric variable
> producing an object, "r".
> The class of "r"is "numeric

OK.

> sorry for the misleading mentioning of a factor that I did eventually 
> not use or better when I found that was considered as factor I 
> considered how to render it numeric
>
>
> -I see the second point even if I would add that yes each Genotype has 
> the same Condition parameter but some Condition is shared between 
> Genotypes in this sense they do not represent exactly the same 
> information or better only in one sense to my understanding

All information in Condition is also contained in Genotype, but not the 
other way around.  You can examine whether Condition (one design column) 
is an adequate substitute for Genotype (8 design columns).

Best
Gordon

> in any case I would go through the results of the models you propose and 
> see differentially expressed genes. And see how I could put the 
> information coming from the two parametrizations together
>
> Anyhow if you have additional comments about this point
> I would appreciate this very much
>
> thanks a lot
>
> and
> Happy Easter
>
>
> Michela
>
> Il giorno 19/apr/2014, alle ore 01:53, Gordon K Smyth <smyth at wehi.EDU.AU> ha scritto:
>
>> Dear Riba,
>>
>> Well, a couple of points.
>>
>> First, if you want to treat Condition as numeric, then you must not declare it to be a factor.  In R, a "factor" is a variable that is categorical instead of numeric.
>>
>> Second, Condition is entirely confounded with Genotype in your experiment. Samples for the same Genotype always have the same Condition. For example, all samples of Genotype pt01 have Condition=0, all samples of Genotype pt06 have Condition=0.5, and so on.  Hence you cannot include Condition and Genotype in the same model, because they are giving the same information.  If you adjust for Genotype in the model, then you have necessarily also adjusted for the Condition.
>>
>> You need:
>>
>>  target<- readTargets("targetPTpGSp.txt")
>>  Genotype <- factor(target$Genotype)
>>  Disease<- factor(target$Disease)
>>  Condition <- target$Condition
>>
>> Then can use either:
>>
>>  design <- model.matrix(~Genotype+Disease)
>>
>> or
>>
>>  design <- mode.matrix(~Condition+Disease)
>>
>> Best wishes
>> Gordon
>>
>>> Date: Fri, 18 Apr 2014 11:19:56 +0200
>>> From: Riba Michela <riba.michela at gmail.com>
>>> To: Gordon K Smyth <smyth at wehi.edu.au>
>>> Cc: Bioconductor mailing list <bioconductor at r-project.org>, James W.
>> MacDonald <jmacdon at uw.edu>
>>> Subject: Re: limma modeling, paired samples and continuous variable
>>>
>>> Hi,
>>> thanks a lot for your answer and I'm forwarding the covariate matrix
>>> of our design.
>>>
>>> target<- readTargets("targetPTpGSp.txt")
>>> head(target)
>>>
>>> Genotype <- factor(target$Genotype)
>>> Disease<- factor(target$Disease, levels=c("stageA", "stageB",
>>> "stageC"))
>>>
>>> # Condition <-factor(target$Condition)
>>> r<-target$Condition #this should be numeric
>>>
>>> I'm just recalling the most striking parts of what I ideally would
>>> try to do and what I have already did.
>>>
>>> Till now I have performed a paired samples analysis using
>>> design <- model.matrix(~Genotype+Disease)
>>>
>>> but I would like to include also a continuous parameter ("Condition")
>>> in the model because it seems
>>> that differentially expressed genes in two different stages of the
>>> disease e.g. "stageB" results in the fit , coming from the above
>>> specified paired sample design and indicating stageB-stageA
>>> differentially expressed genes)
>>> somehow correlate with the "Condition"parameter
>>>
>>> At the model I could make it function using
>>> design<- model.matrix(~Disease+r)
>>> but not using
>>> design <- model.matrix(~Genotype+r)
>>> nor using
>>> design <- model.matrix(~Genotype+Disease+r)
>>>
>>> I'm not sure on what design I should place to try and face the
>>> question
>>> and the simplest I could imagine:
>>> design <- model.matrix(~Genotype+Disease+r)
>>>
>>> does not work
>>>
>>> I thank you very much for your supportive help
>>>
>>> Michela
>>>
>>>
>>
>>>> From smyth at wehi.edu.au Fri Apr 18 11:43:34 2014
>>>> Date: Fri, 18 Apr 2014 11:43:28 +1000 (AUS Eastern Standard Time)
>>>> From: Gordon K Smyth <smyth at wehi.edu.au>
>>>> To: Riba Michela <riba.michela at gmail.com>
>>>> Cc: Bioconductor mailing list <bioconductor at r-project.org>, James W.
>> MacDonald <jmacdon at uw.edu>
>>>> Subject: limma modeling, paired samples and continuous variable
>>>>
>>>>
>>>>> Date: Thu, 17 Apr 2014 09:26:33 +0200
>>>>> From: Riba Michela <riba.michela at gmail.com>
>>>>> To: "James W. MacDonald" <jmacdon at uw.edu>
>>>>> Cc: bioconductor at r-project.org
>>>>> Subject: Re: [BioC] limma modeling, paired samples and continuous
>>>>> 	variable
>>>>>
>>>>> Hi,
>>>>> thanks a lot for your kind answer.
>>>>> I have to specify an additional observation:
>>>>> the "r"parameter is indeed a numeric variable and also in this
>> situation
>>>>> the result is the same.
>>>>
>>>> Actually it is not possible to get the same message as before if you
>> have
>>>> correctly code r as a numeric variable.
>>>>
>>>>> Would be reasonable to try and model the design as:
>>>>> design<- <- model.matrix(~0+r)
>>>>> #where "r"is a numeric variable?
>>>>>
>>>>> for the points about the coefficients I have to reason about
>>>>
>>>> No.
>>>>
>>>> To answer your question "if differential gene expression between two
>> classes
>>>> of disease are correlated with the r status", you probably need a
>> Genotype:r
>>>> iteraction term in your model.
>>>>
>>>> You probably need to show us the whole targets frame for us to help you
>>>> further.  In other words, we need to see:
>>>>
>>>> data.frame(Genotype,Disease,r)
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>> Thanks a lot
>>>>>
>>>>> Michela
>>>>> Il giorno 15/apr/2014, alle ore 16:23, James W. MacDonald
>> <jmacdon at uw.edu>
>>>>> ha scritto:
>>>>>
>>>>>> Hi Michela,
>>>>>>
>>>>>> On 4/15/2014 5:05 AM, michela riba [guest] wrote:
>>>>>>> Hi,
>>>>>>> I'm sorry for re-posting the message, but I cannot find it in the
>>>>>>> archive
>>>>>>> Thanks a lot for attention
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>> I would like to model and retrieve differential expression data
>>>>>>> regarding an experimental design in which different patients (9)
>> have
>>>>>>> different disease classes (3 disease classes) and a feature
>>>>>>> represented with a percentage (0, 0.50, 0.75,1).
>>>>>>>    some conditions are replicated 2 or 3 times, regarding the
>> disease
>>>>>>> condition
>>>>>>> Till now I have done an analysis considering Genotype and Disease in
>>>>>>> the model (as a paired samples analysis)
>>>>>>>
>>>>>>> design <- model.matrix(~Genotype+Disease)
>>>>>>> or
>>>>>>> design <- model.matrix(~0+Genotype+Disease)
>>>>>>>
>>>>>>> now I would like to model also considering
>>>>>>> a continuous variable , namely r
>>>>>>>
>>>>>>> this way: design <- model.matrix(~Genotype+Disease+r)
>>>>>>>
>>>>>>> to see if differential gene expression between two classes of
>> disease
>>>>>>> are correlated with the r status
>>>>>>>
>>>>>>> but till now it is not possible to gain results
>>>>>>> Coefficients not estimable: r0,5 r0,75 r1
>>>>>>> Warning message:
>>>>>>> Partial NA coefficients for 15246 probe(s)
>>>>>>
>>>>>> This indicates that R is using your 'r' data as factor rather than
>>>>>> numeric. I assume that is not what you want? If so you need to ensure
>>>>>> that R thinks that 'r' is a numeric vector.
>>>>>>
>>>>>> If you really are trying to treat 'r' as a factor, then note that you
>>>>>> have either an over-specified model (meaning you are trying to
>> estimate
>>>>>> more parameters than you have observations), or that three of the
>>>>>> coefficients for 'r' are linear combinations of existing coefficients
>>>>>> when you already have genotype and disease in the model.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> if I model
>>>>>>> design <- model.matrix(~Disease+r)
>>>>>>> it goes well, but  it would not consider the different genotypes
>>>>>>>
>>>>>>> I thank you very much for attention
>>>>>>>
>>>>>>> Thanks a lot
>>>>>>>
>>>>>>> Michela
>>>>>>>
>>>>>>> -- output of sessionInfo():
>>>>>>>
>>>>>>> R version 3.0.2 (2013-09-25)
>>>>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>>>>
>>>>>>> locale:
>>>>>>> [1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>>
>>>>>>> other attached packages:
>>>>>>> [1] limma_3.18.13
>>>>>>>
>>>>>>> loaded via a namespace (and not attached):
>>>>>>> [1] tools_3.0.2
>>>>>>>
>>>>>> --
>>>>>> James W. MacDonald, M.S.
>>>>>> Biostatistician
>>>>>> University of Washington
>>>>>> Environmental and Occupational Health Sciences
>>>>>> 4225 Roosevelt Way NE, # 100
>>>>>> Seattle WA 98105-6099
>>>>>>
>>>>>
>>>>> Dr. Michela Riba
>>>>> Genome Function Unit
>>>>> Center for Translational Genomics and Bioinformatics
>>>>> San Raffaele Scientific Institute
>>>>> Via Olgettina 58
>>>>> 20132 Milano
>>>>> Italy
>>>>>
>>>>> lab: +39 02 2643 9114
>>>>> skype: mic_mir32
>>>>> riba.michela at gmail.com
>>>>> riba.michela at hsr.it
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the addressee.
>> You must not disclose, forward, print or use it without the permission of the sender.
>> ______________________________________________________________________
>
> Dr. Michela Riba
> Genome Function Unit
> Center for Translational Genomics and Bioinformatics
> San Raffaele Scientific Institute
> Via Olgettina 58
> 20132 Milano
> Italy
>
> lab: +39 02 2643 9114
> skype: mic_mir32
> riba.michela at gmail.com
> riba.michela at hsr.it
>
>
> Se avete ricevuto il presente messaggio per errore, Vi preghiamo di darne immediata comunicazione al mittente e di provvedere alla sua cancellazione dal vostro computer. Grazie
>
> If you have received this e-mail in error, please let the sender know and delete it from your computer. Thank you
>
>
>
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list