[BioC] Analysis with MBNI re-mapped (custom) CDF files

Richard Pearson richard.pearson at postgrad.manchester.ac.uk
Fri Feb 2 13:00:44 CET 2007


Hi Guido

The PPLR method can be used to propagate expression-level uncertainty 
information into DEG detection. We recommend using the mmgMOS method to 
identify expression levels and standard errors of these levels, but 
there is no reason why you couldn't use the standard errors from methods 
such as fitPLM from the affyPLM package, or MBEI (the "liwong" summary 
method in the affy package). In fact, I'd be quite interested in how you 
get on doing this, particularly with the re-mapped CDF files.

PPLR is currently available as an R package from 
http://www.bioinf.manchester.ac.uk/resources/puma/, although the 
documentation is fairly limited. I am developing a Bioconductor package 
called "puma" (Propagating Uncertainty in Microarray Analysis). puma 
will include mmgMOS and an extension of PPLR to multi-factorial 
experiments, as well as uncertainty propagation versions of principal 
components analysis (PCA) and clustering. It will also have more 
extensive documentation and case studies than the existing packages. 
This should be available in Bioconductor 2.0, but I plan to have a 
development version (with limited function-level documentation) ready in 
the next few weeks. Let me know if you'd like to try out the development 
version.

Best regards

Richard

--
Richard Pearson
School of Computer Science,
University of Manchester,
Oxford Road,
Manchester M13 9PL, UK.
http://www.cs.man.ac.uk/~pearsonr/


Hooiveld, Guido wrote:
> Dear list,
>
> Because I like the undelying idea, I have began using the re-mapped CDF files provided by the MBNI. However, triggered by a remark made by Dr MacDonald "... note that there are some downsides to using these cdfs, mainly that the standard errors of your estimates will be highly variable, since the 
> probesets for these cdfs are quite variable in size (unlike the stock affy chip, where the vast majority have 11 probes)" from this thread http://article.gmane.org/gmane.science.biology.informatics.conductor/11282, I determined the number of probes that map to a probe set for both default Affymetrix CDF file and Entrez-gene based re-mapped CDF file for the Mouse430_2 array.
>
> Outcomes:
> library(mouse4302probe)
> a <- as.data.frame(mouse4302probe)
> b <- as.factor(a[,4])
> table(table(b))
>
>     8     9    10    11    20    21 
>     1     5    20 45032    40     3 
>
>
>
> library(mm430mmentrezgprobe)
> a <- as.data.frame(mm430mmentrezgprobe)
> b <- as.factor(a[,4])
> table(table(b))
>
>
>    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18 
>  230  213  219  283  419  663 1265 1741 5092  284  261  234  193  205  206  255 
>
>   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34 
>  412  569  639 1249  121   98   96   91   72   89  113  122  173  166  279   38 
>
>   35   36   37   38   39   40   41   42   43   44   45   46   47   48   49   50 
>   39   30   32   36   20   35   41   46   40   50   18   15   10    6    8    9 
>
>   51   52   53   54   55   56   57   58   60   61   62   63   64   65   66   67 
>    9   14   13   12   18    6    6    1    4    3    4    2    2    2    1    1 
>
>   68   70   71   73   74   75   76   80   89 
>    3    3    3    3    2    2    1    1    1 
>
>
> This indeed confirms Dr MacDonald's observations, which I would like to address in more detail...
> However, as a biologists with limited experience with statistics & R/BioC, I do have some (practical) questions:
>
> - How can I extract the name of (lets's say) the 230 probesets that consists of 3 probes?
> - When applying RMA, probe set expression levels are summerized according to Median Polish. What is the minimum number of probes (x) that have to be summerized to obtain a robust average using Median Polish? In other words, probe sets consisting of less than x probes are better not dealt with?
> - Can the standard error of the estimated expression according to RMA be extracted from an eSet? If so, how could this be propagated into the statistical analysis (eg. limma) that is used to identify DEGs? 
>
> FYI: as a biologist I have concluded that re-mapping improved my analyses: when comparing the lists of most regulated genes based on analyses with Affy or re-mapped CDF, the latter identified genes that were missing in the Affy top-list, altough those genes were expected to present based on prior knowledge. However, this only applies to the top-regulated genes (that are expressed at relatively high levels), I haven't carefully evaluated the complete lists yet.  
>    
> Guido
>  
> ------------------------------------------------
> Guido Hooiveld, PhD
> Nutrition, Metabolism & Genomics Group
> Division of Human Nutrition
> Wageningen University
> Biotechnion, Bomenweg 2
> NL-6703 HD Wageningen
> the Netherlands
>  
> tel: (+)31 317 485788
> fax: (+)31 317 483342
>  
> internet:   http://nutrigene.4t.com
> email:      guido.hooiveld at wur.nl
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list