[BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA

Wed Feb 29 11:42:17 CET 2012

Dear Jim,
Yes i figured that out soon enough for me to proceed with the analysis. I annotated my eset as follows:

Library(annoatate)

ID <- featureNames(eset)
Symbol <- getSYMBOL(ID, "mogene10sttranscriptcluster.db")
tmp <- data.frame(ID=ID, Symbol=Symbol)
tmp[tmp=="NA"] <- NA
fData(eset) <- tmp
 and used the following to filter on highest expression

fit <- lmFit(eset, design)
o <- order(fit$Amean, decreasing=TRUE)
dup <- duplicated(fit$genes$Symbol[o])
fit.unique <- fit[o,][!dup,]

Thanks very much for your help.
Best,
Ekta

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu] 
Sent: 28 February 2012 03:22
To: Ekta Jain
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA

Hi Ekta,

The relevant annotation packages do indeed exist for the Human Gene ST 
arrays on the latest version of R. Try

source("http://www.bioconductor.org/biocLite.R")
biocLite("hugene10sttranscriptcluster.db")

You may also need to change the annotation of your ExpressionSet, so if 
you can do something like:

annotation(eset) <- "hugene10sttranscriptcluster.db"

Best,

Jim

On 2/27/12 3:10 AM, Ekta Jain wrote:
> Hello Jim,
> Thank you very much for your detailed reply. I did have some misconceptions about LIMMA indeed. I am not much in charge of the methodology in this case unfortunately and the requirement is to ignore the other expression values for probesets and only keep the probeset with maximum expression value for that gene symbol.
> I am afraid i am unable to use the findLargest() function from the gene filter since it needs the ENTREZ ID annotation and i am using annotation from a tab delimited text file. Working on the Human Gene 1.0 Gene ST Array and the relevant packages do not exist for the latest version of R. I will try and tweak it in my favour.
>
> Alternatively I also tried the solution provided by Gordon but encounter memory errors. Will have to try the same on a higher RAM Machine.
>
> Thanks and Regards,
> Ekta
>
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: 23 February 2012 19:55
> To: Ekta Jain
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA
>
> Hi Ekta,
>
> On 2/22/2012 10:06 PM, Ekta Jain wrote:
>> Hi Jim,
>> I am using an affymetrix chip data. I need to analyse my dataset for differential gene expression (LIMMA). Each gene can be referenced by multiple probesets and while performing LIMMA the expression values of these multiple probesets gets averaged and this averaged value is assigned to that gene. I need to be able to simply select the probeset with the highest expression value to represent a gene.
>>
>> LIMMA by default averages the probeset values.
> This is not true. The limma package doesn't know or care that two
> probesets are intended to interrogate the same gene, and doesn't do the
> averaging that you think it does. You can't even do a mixed model, using
> the 'duplicate' probesets because they aren't duplicates, and you don't
> have the same number of probesets per gene. What limma does is make
> univariate comparisons by-probeset, so if you have four probesets that
> interrogate the same gene transcript, then you will do four tests.
>
> Now you could make the assumption (unfounded, IMO) that all the
> probesets that are intended to measure a particular transcript are
> really measuring the same thing, and then choose to use just one of them
> based on some metric. As an example, you could use 'highest expression
> value', which doesn't make any sense to me.
>
> To expound on that last statement, let's say you have two transcripts
> that are purported to measure the same gene. Now let's further stipulate
> that one of these probesets has really high expression (somewhere around
> 2^14), but the expression isn't materially different between any of your
> samples. In addition, the other probeset has almost undetectable
> expression in one set of samples, but some middling expression  (say
> 2^8) in another set. Do you really want to throw out the latter probeset
> in favor of the former?
>
> Now back to your question. If you want to pre-filter the data (again,
> not recommended with the limma package, due to the empirical Bayes
> estimator), you can use the findLargest() function in the genefilter
> package. You have to supply a test statistic to this function, for which
> you could use either the rowMean(), which will give you the highest
> average expression, or you could do something like apply(exprs(eset),1 ,
> max) to get the maximum expression value.
>
> Best,
>
> Jim
>
>
>> I am not sure if i need to modify any default settings in LIMMA or use another package.
>>
>> Thanks
>>
>> Regards,
>> Ekta
>>
>> -----Original Message-----
>> From: James W. MacDonald [mailto:jmacdon at uw.edu]
>> Sent: 22 February 2012 19:26
>> To: Ekta [guest]
>> Cc: bioconductor at r-project.org; Ekta Jain
>> Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA
>>
>> Hi Ekta,
>>
>> On 2/21/2012 10:57 PM, Ekta [guest] wrote:
>>> Hello All,
>>> I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline
>>> expression/signal intensity?
>> You will have to be more precise than that. What exactly do you mean by
>> 'selects the probesets with highest baseline expression'? Do you just
>> want any probesets where one or more samples has high expression? That
>> doesn't require limma. Or do you want probesets where some of the
>> samples have much higher expression than others?
>>
>> Best,
>>
>> Jim
>>
>>
>>> Any help would be greatly appreciated.
>>>
>>>
>>>
>>>     -- output of sessionInfo():
>>>
>>>> sessionInfo()
>>> R version 2.9.1 (2009-06-26)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETARY=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] limma_2.18.3
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email.
>> www.jubl.com
>>
> The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email.
> www.jubl.com
>

The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email.
www.jubl.com