[BioC] Analysis of Affymetrix Human Gene 2.0 ST arrays

María Maqueda [guest] guest at bioconductor.org
Fri Nov 29 12:18:50 CET 2013

Dear all,
 I am analyzing a set of Affymetrix Human Gene 2.0 ST arrays, this is my first time working with this type of arrays so I have a few general questions. I would very much appreciate any advice you could give.

(1) I have obtained different lists of differentially expressed genes (using eBayes() from limma). In those lists, some control transcripts are popping up (i.e normgene -> intron category among other categories). I was not expecting this type of transcripts at this point. In theory after normalization, no control transcripts should appear, am I right? Have you experienced this? 
I have read that one possibility is to use getMainProbes before topTable selection but I wonder if there could be something wrong from the beginning with my normalization process (I have used rma() – transcript level - from oligo). What is your opinion?

(2) This type of arrays also includes lincRNA transcripts and I am interested in considering them for my analysis. The thing is that I am using hugene20sttranscriptcluster.db for annotation and these lincRNA are not included. Would this library be able to handle them? 

(3) I tried to make my own annotation package thru makeDBPackage based on .csv annotation file from Affy but I got an error…:  Error in `[.data.frame`(csvFile, , GenBank IDName) : undefined columns selected
I have already read in this mailing list that makeDBPackage may expect a HGU133plus2 annotation “style”. Would the library annotationForge be able to handle this?

Many thanks in advance for any help!

María Maqueda

Biomedical Engineering Research Centre (CREB)
Universitat Politècnica de Catalunya (UPC)

 -- output of sessionInfo(): 

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] human.db0_2.9.0                       AnnotationForge_1.2.2                
 [3] hugene20sttranscriptcluster.db_2.12.1 org.Hs.eg.db_2.9.0                   
 [5] AnnotationDbi_1.22.6                  BiocInstaller_1.12.0                 
 [7] limma_3.16.8                          pd.hugene.2.0.st_3.8.0               
 [9] oligo_1.24.2                          Biobase_2.20.1                       
[11] oligoClasses_1.22.0                   BiocGenerics_0.6.0                   
[13] RSQLite_0.11.4                        DBI_0.2-7                            

loaded via a namespace (and not attached):
 [1] affxparser_1.32.3     affyio_1.28.0         annotate_1.38.0      
 [4] Biostrings_2.28.0     bit_1.1-10            codetools_0.2-8      
 [7] ff_2.2-12             foreach_1.4.1         genefilter_1.42.0    
[10] GenomicRanges_1.12.5  IRanges_1.18.4        iterators_1.0.6      
[13] preprocessCore_1.22.0 splines_3.0.1         stats4_3.0.1         
[16] survival_2.37-4       tools_3.0.1           XML_3.98-1.1         
[19] xtable_1.7-1          zlibbioc_1.6.0 

Sent via the guest posting facility at bioconductor.org.

