[BioC] question about lmFit model

Christos Hatzis christos.hatzis at nuverabio.com
Mon Jan 25 21:30:56 CET 2010


This strategy is bound to be less efficient, though.
See a recent article on this subject.
http://www.biomedcentral.com/1471-2105/10/402

-Christos


Christos Hatzis, Ph.D.
Nuvera Biosciences, Inc.
400 West Cummings Park, Suite 5350
Woburn, MA 01801
781-938-3844



-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of sabrina s
Sent: Monday, January 25, 2010 3:17 PM
To: Sunny Srivastava
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] question about lmFit model

Dear Sunny:
Thanks for your input. personally I prefer combine p-value and fc together
because you can not validate all genes detected, but pick some with higher
FC will probably feasible to do.

Sabrina



On Mon, Jan 25, 2010 at 12:05 AM, Sunny Srivastava
<research.baba at gmail.com>wrote:

> Dear Sabrina,
> Experienced members of the group will have better things to say but here
is
> my $0.25.
> As a statistician - I would prefer Design 1. The reason is - that data
> should never be ignored.
>
> Also, more the data, Limma can take more advantage of this information in
> the Empirical Bayesian Estimation of S.D. Lower p-values are because of
this
> fact. (Taking less data might result in inflated SDs which can also result
> in lower p-values.)
>
> Comparing Differential expression and Fold Change is like comparing Apple
> and oranges. Differential expression has nothing to do with low fold
change.
> As a statistician, I would always trust differential expression than
> Fold-Change.
> If you think that fold-change is important for you then you should select
> the differentially expressed genes ONLY if their log fold-change is above
> say 2.
>
> you can do this in limma using topTable and/or decideTests.
>
> Pls correct me if I am wrong.
>
> Thx
> S.
>
> On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao at gmail.com> wrote:
>
>> Hi, Jenny:
>> Thanks for the quick reply. And thanks for pointing out about posting. I
>> thought maybe my subject was not good enough to be noticed and that is
why
>> I
>> posted again. This is my first post, so long way to go!
>> Regarding your second point: I don't think my question is a general one
>> about why ANOVA is better than a series of t-tests. I actually did both,
>> but
>> realized that the result from one single model ( use all samples) gave me
>> much lower p-values, but when I looked at the expression value, the fold
>> change was nothing , like 0.5. That is why I wonder if the inflated DOF
>> gave
>> me much low p-values. Any thoughts on that?
>>
>> Thanks!
>>
>> Sabrina
>>
>> On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich at illinois.edu
>> >wrote:
>>
>> > Hi Sabrina,
>> >
>> > First, a little list ettiquette. If you don't get a response to a post
>> > within a day, it's not considered polite to just repost the same
>> question
>> > verbatim the next day under a different Subject.
>> >
>> > Second: your question isn't specific to the modeling of lmFit. Instead,
>> > it's a general statistical question about why it's better to one ANOVA
>> model
>> > instead of a series of t-tests. I suggest you consult a basic
>> statistical
>> > textbook or a local statistician to find the answer.
>> >
>> > Cheers,
>> > Jenny
>> >
>> >
>> > At 10:39 AM 1/21/2010, sabrina s wrote:
>> >
>> >> Hello, everyone:
>> >>
>> >> I have a question related to conceptual understanding of lmFit.
>> >>
>> >> I have the following experiment that I want to conduct, but I am not
>> sure
>> >> which is the right way to use design matrix and contrasts. Here is the
>> >> experiment:
>> >>
>> >> say I have 3 different strains that are genetically different, A, B
and
>> C
>> >> where A is the control. I also have two different treatments,
>> >>  T1 and T2. For each strain, I have 4 arrays for each treatment, so in
>> >> total, I have 24 arrays. What I want to find out is the significantly
>> >> differentially expressed genes for the following comparison:
>> >> 1) for control strain A:  T1 vs T2
>> >> 2) under T1, B vs. A (control)
>> >> 3) under T1, C vs. A
>> >> 4) for B, T1 vs T2
>> >> 5) for C, T1 vs T2
>> >> 6) interaction term of A and B , T1 and T2
>> >> 7) interaction term of A and C, T1 and T2.
>> >>
>> >> There are two ways I could use lmFit
>> >>
>> >> One is:
>> >>
>> >> for the design matrix, I will include all 3 strains and 2 conditions,
>> >> I use the following code:
>> >>            A_T1, A_T2, B_T1, B_T2, C_T1, C_T2
>> >> sample1:   1      ,0         ,0,        0,      0  ,         0
>> >> sample2 :
>> >>
>> >> Then make a contrast matrix and follow the code below:
>> >>
>> >> fitGene<-lmFit(gene,design=design,weights=arrayWt);
>> >>  fitGene2<-contrasts.fit(fitGene,cont.matrix)
>> >> fitGene2<-eBayes(fitGene2,proportion=p);
>> >>
>> >>
>> >> Two:
>> >> Instead of using all samples at one time to fit into a lmFit function,
>> I
>> >> use
>> >> two design matrix only involves A and B, T1 and T2,
>> >> and second design matrix that involves A and C, T1 and T2, and make
>> >> contrast
>> >> matrix and fit separately. and later on I can compare these two
>> >> results if I want to.
>> >>
>> >>
>> >>
>> >> The question I have is: which one is the right one? For the first
>> method,
>> >> I
>> >> will have large DOF , and much lower p-values, but it was testing the
>> >> same thing as the second one, so am I creating an artifact? Thanks for
>> >> your help!
>> >>
>> >>
>> >>
>> >>
>> >> Sabrina
>> >>
>> >>        [[alternative HTML version deleted]]
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at stat.math.ethz.ch
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>
>> >
>> > Jenny Drnevich, Ph.D.
>> >
>> > Functional Genomics Bioinformatics Specialist
>> > W.M. Keck Center for Comparative and Functional Genomics
>> > Roy J. Carver Biotechnology Center
>> > University of Illinois, Urbana-Champaign
>> >
>> > 330 ERML
>> > 1201 W. Gregory Dr.
>> > Urbana, IL 61801
>> > USA
>> >
>> > ph: 217-244-7355
>> > fax: 217-265-5066
>> > e-mail: drnevich at illinois.edu
>> >
>>
>>
>>
>> --
>> Sabrina
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>


-- 
Sabrina

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list