[BioC] Question about hgu133plus2cdf?

Nicolas Delhomme delhomme at embl.de
Thu Mar 15 14:37:12 CET 2012


Dear Fabrice,

The hgu133plus2cdf in Bioc is based on the information provided by Affymetrix. 

The custom CDF from the website you mention, contains probes re-aligned to the human genome and only those probes that have a unique mapping are used. See their publication:  Dai et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Research (2005) vol. 33 (20) pp. e175 .

That won't solve your SNP problem, but you can use the hgu133plus2probes package that contains the probe sequences or the one provided by Dai et al for that. Based on these sequences and their mapping, you should be able to filter out those that contains SNPs you're not interested in. For that the IRanges functionalities might prove helpful. Whether you drop the whole probe-set or try to re-create your own CDF then is up to you. 

If you want to create your own CDF, check the vignette of the makecdfenv package for that: vignette("makecdfenv"). And you might want to make sure your new probe-set are valid. This paper is a good starting point for that:  Lu et al. Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: high-resolution annotation for microarrays. BMC Bioinformatics (2007) vol. 8 pp. 108.

HTH,

Nico

P.S. sorry missed the reply-all in the first place

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------


On 15 Mar 2012, at 13:44, Fabrice Tourre wrote:

> Dear list,
> 
> I am now analysis hgu133plus2 array. I want a CDF which has been
> removed probes with SNPs. Because I want to remove the the noise
> caused by single nucleotide polymorphisms (SNPs) in different samples.
> Also I do not want some probeset which sequences can mapped to
> multiple genome position.
> 
> In bioconductor, there is a package hgu133plus2cdf. I also noticed
> there is a website provide custom CDF file for hgu133plus2.
> 
> The website is:
> http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
> HGU133Plus2 (Version 15.0.0, ENTREZG)
> 
> Is the same for this two CDF files?
> 
> Or the package hgu133plus2cdf directly from Affy CDF file?
> 
> Thank you very much in advance.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list