[BioC] Analysis of Affymetrix Human Gene 2.0 ST arrays

James W. MacDonald jmacdon at uw.edu
Fri Nov 29 15:04:20 CET 2013

Hi Maria,

On 11/29/2013 6:18 AM, María Maqueda [guest] wrote:
> Dear all,
>   I am analyzing a set of Affymetrix Human Gene 2.0 ST arrays, this is my first time working with this type of arrays so I have a few general questions. I would very much appreciate any advice you could give.
> (1) I have obtained different lists of differentially expressed genes (using eBayes() from limma). In those lists, some control transcripts are popping up (i.e normgene -> intron category among other categories). I was not expecting this type of transcripts at this point. In theory after normalization, no control transcripts should appear, am I right? Have you experienced this?
> I have read that one possibility is to use getMainProbes before topTable selection but I wonder if there could be something wrong from the beginning with my normalization process (I have used rma() – transcript level - from oligo). What is your opinion?

I don't think it has anything to do with the normalization. Instead, I 
think it is a combination of poorly designed probes and highly expressed 
genes for which there are sufficient unprocessed mRNA transcripts that 
still have their introns intact (remember that the processing of samples 
stops all enzymatic activity very quickly as a first step, so any mRNA 
that is in the process of being transcribed, or is just finishing 
transcription will likely still have introns).

> (2) This type of arrays also includes lincRNA transcripts and I am interested in considering them for my analysis. The thing is that I am using hugene20sttranscriptcluster.db for annotation and these lincRNA are not included. Would this library be able to handle them?

Hypothetically yes, as of now not really. It doesn't seem like that many 
have been annotated with Entrez Gene IDs, and until that happens they 
won't appear in the annotation packages. And even for those that do have 
Entrez Gene IDs, the information stops there - you go to NCBI and it 
just says that the lincRNA is supposed to exist, but nothing else.

> (3) I tried to make my own annotation package thru makeDBPackage based on .csv annotation file from Affy but I got an error…:  Error in `[.data.frame`(csvFile, , GenBank IDName) : undefined columns selected
> I have already read in this mailing list that makeDBPackage may expect a HGU133plus2 annotation “style”. Would the library annotationForge be able to handle this?

AnnotationForge cannot handle the csv files for these arrays directly, 
as they are completely different from the old style 3'-biased arrays 
like the hgu133plus2 that you mention. I have a function I can give you 
to make the input file for the annotation package, but I don't think it 
is worth it because it would be the function that I already used to make 
the annotation package you can get from BioC. So you could go through 
all the effort to make something you can already get.

But if you want it, I will send it to you.



> Many thanks in advance for any help!
> María Maqueda
> Biomedical Engineering Research Centre (CREB)
> Universitat Politècnica de Catalunya (UPC)
>   -- output of sessionInfo():
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> locale:
> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> [5] LC_TIME=Spanish_Spain.1252
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
> other attached packages:
>   [1] human.db0_2.9.0                       AnnotationForge_1.2.2
>   [3] hugene20sttranscriptcluster.db_2.12.1 org.Hs.eg.db_2.9.0
>   [5] AnnotationDbi_1.22.6                  BiocInstaller_1.12.0
>   [7] limma_3.16.8                          pd.hugene.2.0.st_3.8.0
>   [9] oligo_1.24.2                          Biobase_2.20.1
> [11] oligoClasses_1.22.0                   BiocGenerics_0.6.0
> [13] RSQLite_0.11.4                        DBI_0.2-7
> loaded via a namespace (and not attached):
>   [1] affxparser_1.32.3     affyio_1.28.0         annotate_1.38.0
>   [4] Biostrings_2.28.0     bit_1.1-10            codetools_0.2-8
>   [7] ff_2.2-12             foreach_1.4.1         genefilter_1.42.0
> [10] GenomicRanges_1.12.5  IRanges_1.18.4        iterators_1.0.6
> [13] preprocessCore_1.22.0 splines_3.0.1         stats4_3.0.1
> [16] survival_2.37-4       tools_3.0.1           XML_3.98-1.1
> [19] xtable_1.7-1          zlibbioc_1.6.0
> --
> Sent via the guest posting facility at bioconductor.org.
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

More information about the Bioconductor mailing list