[BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA
James W. MacDonald
jmacdon at uw.edu
Thu Feb 23 15:24:49 CET 2012
On 2/22/2012 10:06 PM, Ekta Jain wrote:
> Hi Jim,
> I am using an affymetrix chip data. I need to analyse my dataset for differential gene expression (LIMMA). Each gene can be referenced by multiple probesets and while performing LIMMA the expression values of these multiple probesets gets averaged and this averaged value is assigned to that gene. I need to be able to simply select the probeset with the highest expression value to represent a gene.
> LIMMA by default averages the probeset values.
This is not true. The limma package doesn't know or care that two
probesets are intended to interrogate the same gene, and doesn't do the
averaging that you think it does. You can't even do a mixed model, using
the 'duplicate' probesets because they aren't duplicates, and you don't
have the same number of probesets per gene. What limma does is make
univariate comparisons by-probeset, so if you have four probesets that
interrogate the same gene transcript, then you will do four tests.
Now you could make the assumption (unfounded, IMO) that all the
probesets that are intended to measure a particular transcript are
really measuring the same thing, and then choose to use just one of them
based on some metric. As an example, you could use 'highest expression
value', which doesn't make any sense to me.
To expound on that last statement, let's say you have two transcripts
that are purported to measure the same gene. Now let's further stipulate
that one of these probesets has really high expression (somewhere around
2^14), but the expression isn't materially different between any of your
samples. In addition, the other probeset has almost undetectable
expression in one set of samples, but some middling expression (say
2^8) in another set. Do you really want to throw out the latter probeset
in favor of the former?
Now back to your question. If you want to pre-filter the data (again,
not recommended with the limma package, due to the empirical Bayes
estimator), you can use the findLargest() function in the genefilter
package. You have to supply a test statistic to this function, for which
you could use either the rowMean(), which will give you the highest
average expression, or you could do something like apply(exprs(eset),1 ,
max) to get the maximum expression value.
> I am not sure if i need to modify any default settings in LIMMA or use another package.
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: 22 February 2012 19:26
> To: Ekta [guest]
> Cc: bioconductor at r-project.org; Ekta Jain
> Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA
> Hi Ekta,
> On 2/21/2012 10:57 PM, Ekta [guest] wrote:
>> Hello All,
>> I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline
>> expression/signal intensity?
> You will have to be more precise than that. What exactly do you mean by
> 'selects the probesets with highest baseline expression'? Do you just
> want any probesets where one or more samples has high expression? That
> doesn't require limma. Or do you want probesets where some of the
> samples have much higher expression than others?
>> Any help would be greatly appreciated.
>> -- output of sessionInfo():
>> R version 2.9.1 (2009-06-26)
>> attached base packages:
>>  stats graphics grDevices utils datasets methods base
>> other attached packages:
>>  limma_2.18.3
>> Sent via the guest posting facility at bioconductor.org.
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email.
More information about the Bioconductor