[BioC] Reading Affy CEL files
James W. MacDonald
jmacdon at uw.edu
Fri May 31 19:05:24 CEST 2013
Hi Ranjani,
On 5/31/2013 12:53 PM, Ranjani R [guest] wrote:
> I am a newbie to Affy. Thanks for your help.
>
> I am processing CEL files through R (Affy package) and am having some basic issues that I am not finding satisfactory answers to (have googled).
> The chip used is hugene11stv1. I also am using the hugene11stprobeset.db to try to do probeset –> Symbol translation.
> Essentially, I want to create a file with gene expression data, with genes * samples as my final matrix.
>
> Code:
> setwd(wDir);
> Data<- ReadAffy();
> eset<- rma(Data);
> write.exprs(eset,file="geneExpData.txt", sep="\t", quote = F);
>
> When I analyze the file written, I see that the number of columns is as I expect(number samples) but there are 33,297 genes.
> Please help me understand a few fundamental aspects here:
>
> 1. I tried translating these Affy IDs to gene symbols to see if that would make my analysis easier.
> Here are some things I tried
>
> Try 1:
> symbols<- getSYMBOL(as.character(expr.matrix[,1]), "hugene11stprobeset"); –> Not quite working. Only ~175 of the probeset IDs are getting translated.
There are two problems here. First, the affy package isn't designed for
this array, and in fact won't let you proceed if you upgrade to the new
version of Bioconductor. You should really be using either oligo or xps
(both BioC packages) for the analysis of this array.
Second, the affy package is only able to summarize these arrays at the
transcript level, and you are trying to annotate using a package that
assumes you have summarized at the probeset level (where each probeset
is only interrogating a smaller portion of the transcript, often just a
single exon). If you want to annotate your transcript level data, you
need the hugene11sttranscriptcluster.db package.
Best,
Jim
> Try 2:
> symbs<- mget(featureNames(eset), hugene11stprobesetSYMBOL, ifnotfound =NA);
> symbs<- unlist(symbs)
> mat<- eset; # make a copy
> featureNames(mat)<- ifelse(!is.na(symbs), symbs, featureNames(mat))
>
> Many NAs.
>
> Can you please help me understand what is happening here.
>
>
> -- output of sessionInfo():
>
> R version 2.15.3 (2013-03-01)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] hugene11stv1cdf_2.3.0 affy_1.36.1 Biobase_2.18.0
> [4] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.26.0 BiocInstaller_1.8.3 preprocessCore_1.20.0
> [4] tools_2.15.3 zlibbioc_1.4.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list