[BioC] Use probesets with highest baseline expression for differntial gene

Fri Feb 24 00:30:22 CET 2012

Dear Ekta,

Jim as already pointed out that you have some incorrect perceptions about 
what limma does by default.

If you need to keep one probe for each gene symbol after a limma lmFit, 
and you want to choose the probe with highest average expression, it is 
easy to do like this.  I will assume that your linear model fit object is 
called 'fit', and your annotation includes a column called "Symbol" 
containing the gene symbol.

    o <- order(fit$Amean, decreasing=TRUE)
    dup <- duplicated(fit$genes$Symbol[o])
    fit.unique <- fit[o,][!dup,]

Now your fit object fit.unique has only one row for each symbol.

This sort of filtering has been done in many papers when it is wished to 
match symbols across platforms, or to do gene set testing.

Best wishes
Gordon

------------------ original message ----------------
[BioC]  Use probesets with highest baseline expression for differntial 
gene expression in LIMMA

Ekta Jain Ekta_Jain at jubilantbiosys.com
Thu Feb 23 04:06:09 CET 2012

Hi Jim,
I am using an affymetrix chip data. I need to analyse my dataset for 
differential gene expression (LIMMA). Each gene can be referenced by 
multiple probesets and while performing LIMMA the expression values of 
these multiple probesets gets averaged and this averaged value is assigned 
to that gene. I need to be able to simply select the probeset with the 
highest expression value to represent a gene.

LIMMA by default averages the probeset values.

I am not sure if i need to modify any default settings in LIMMA or use 
another package.

Thanks

Regards,
Ekta

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu]
Sent: 22 February 2012 19:26
To: Ekta [guest]
Cc: bioconductor at r-project.org; Ekta Jain
Subject: Re: [BioC] Use probesets with highest baseline expression for 
differntial gene expression in LIMMA

Hi Ekta,

On 2/21/2012 10:57 PM, Ekta [guest] wrote:
> Hello All,
> I am relatively new to R and bioconductor. I would like to know if there 
is a way to alter LIMMA defualt options such that the package instead of 
averaging signal intensities of probesets selects the probesets with 
highest baseline
> expression/signal intensity?

You will have to be more precise than that. What exactly do you mean by
'selects the probesets with highest baseline expression'? Do you just
want any probesets where one or more samples has high expression? That
doesn't require limma. Or do you want probesets where some of the
samples have much higher expression than others?

Best,

Jim

>
> Any help would be greatly appreciated.
>
>
>
>   -- output of sessionInfo():
>
>> sessionInfo()
> R version 2.9.1 (2009-06-26)
> i386-pc-mingw32
>
> locale:
> 
LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETARY=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] limma_2.18.3

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}