[BioC] Empty featureData in ExpressionSet generated by affy::justRMA
David Shih
djh.shih at gmail.com
Fri May 20 21:12:57 CEST 2011
Hi Jim,
It turns out that the assayData slot was not filled during the
post-processing normalization.
The original ExpressionSet object returned by justRMA() correctly
populates the assayData slot.
> exprs(eset)[1:5, 1:5]
5500024024214122006605.A01.CEL 5500024024214122006605.A02.CEL
1007_s_at 9.261711 9.672637
1053_at 5.732471 7.267763
117_at 5.316724 4.759532
121_at 6.999956 6.848935
1255_g_at 4.614048 4.572704
5500024024214122006605.A03.CEL 5500024024214122006605.A05.CEL
1007_s_at 9.780707 9.684070
1053_at 8.433437 7.870544
117_at 4.760007 4.783464
121_at 6.647044 6.635504
1255_g_at 4.577906 4.548474
5500024024214122006605.A06.CEL
1007_s_at 9.384182
1053_at 7.388526
117_at 5.662874
121_at 6.433151
1255_g_at 4.640982
I'll have to update the post-processing normalization script to
reconstruct the ExpressionSet object properly.
On another note, it seems that the probe ids in the expression matrix
returned by exprs() are largely in the same order as the keys returned
by mappedkeys(hgu133a2ACCNUM):
> keys <- mappedkeys(hgu133a2ACCNUM);
> probes <- rownames(exprs(eset));
> length(probes)
[1] 22277
> length(keys)
[1] 22277
> sum(probes == keys)
[1] 22230
> probes[probes != keys]
[1] "AFFX-hum_alu_at" "AFFX-HUMGAPDH/M33197_3_at"
[3] "AFFX-HUMGAPDH/M33197_5_at" "AFFX-HUMGAPDH/M33197_M_at"
[5] "AFFX-HUMISGF3A/M97935_3_at" "AFFX-HUMISGF3A/M97935_5_at"
[7] "AFFX-HUMISGF3A/M97935_MA_at" "AFFX-HUMISGF3A/M97935_MB_at"
[9] "AFFX-HUMRGE/M10098_3_at" "AFFX-HUMRGE/M10098_5_at"
[11] "AFFX-HUMRGE/M10098_M_at" "AFFX-LysX-3_at"
[13] "AFFX-LysX-5_at" "AFFX-LysX-M_at"
[15] "AFFX-M27830_3_at" "AFFX-M27830_5_at"
[17] "AFFX-M27830_M_at" "AFFX-PheX-3_at"
[19] "AFFX-PheX-5_at" "AFFX-PheX-M_at"
[21] "AFFX-r2-Bs-dap-3_at" "AFFX-r2-Bs-dap-5_at"
[23] "AFFX-r2-Bs-dap-M_at" "AFFX-r2-Bs-lys-3_at"
[25] "AFFX-r2-Bs-lys-5_at" "AFFX-r2-Bs-lys-M_at"
[27] "AFFX-r2-Bs-phe-3_at" "AFFX-r2-Bs-phe-5_at"
[29] "AFFX-r2-Bs-phe-M_at" "AFFX-r2-Bs-thr-3_s_at"
[31] "AFFX-r2-Bs-thr-5_s_at" "AFFX-r2-Bs-thr-M_s_at"
[33] "AFFX-r2-Ec-bioB-3_at" "AFFX-r2-Ec-bioB-5_at"
[35] "AFFX-r2-Ec-bioB-M_at" "AFFX-r2-Ec-bioC-3_at"
[37] "AFFX-r2-Ec-bioC-5_at" "AFFX-r2-Ec-bioD-3_at"
[39] "AFFX-r2-Ec-bioD-5_at" "AFFX-r2-P1-cre-3_at"
[41] "AFFX-r2-P1-cre-5_at" "AFFX-ThrX-3_at"
[43] "AFFX-ThrX-5_at" "AFFX-ThrX-M_at"
[45] "AFFX-TrpnX-3_at" "AFFX-TrpnX-5_at"
[47] "AFFX-TrpnX-M_at"
The orders are the same, except that the order of the some control probes have
changed.
But I guess this match is not guaranteed to be true in general.
Thank you for you help.
Regards,
David
On Thu, May 19, 2011 at 09:55:51AM -0400, James W. MacDonald wrote:
> Hi David,
>
> On 5/18/2011 2:04 PM, David Shih wrote:
> >Hi Jim,
> >
> >Thank you for clarifying the purpose of the featureData slot.
> >The assayData slot also appears empty in my ExpressionSet.
>
> It's not empty, you just didn't access it correctly.
>
> > showMethods(exprs, class="ExpressionSet", includeDefs = TRUE)
> Function: exprs (package Biobase)
> object="ExpressionSet"
> function (object)
> assayDataElement(object, "exprs")
>
> and
>
> > assayDataElement
> function (object, elt)
> assayData(object)[[elt]]
> <environment: namespace:Biobase>
>
> So exprs() is extracting data from your assayData slot, but you have
> to choose the "exprs" element of the AssayData object that
> assayData() returns.
>
> >
> >The rownames are missing in my expression matrix extracted using.
> >>exprs(eset)
>
> Well, they *should* be there, so this indicates some bug in the
> processing. What does exprs(eset)[1:5,1:5} look like, exactly?
>
> Best,
>
> Jim
>
>
> >
> >I was hoping to fill the rownames of my matrix using keys obtained
> >from
> >>keys<- mappedkeys(hgu133a2ACCUM)
> >
> >I checked that all the keys are mapped, and I am under the impression
> >that they should be for hgu133a2ACCUM.
> >
> >My main concern is that hgu133a2ACCUM may not return the keys
> >(probe ids) in the same order as they appear in the expression matrix,
> >which would cause the probes to be incorrectly mapped.
> >
> >I have no problem with normalizing older platforms like U133. All the
> >probe ids are automatically assigned upon extraction with exprs().
> >However, the hthgu133a platform does not appear to be as
> >straightforward.
> >
> >I had a similar problems with the HT Human Gene 1.0 ST platform.
> >I ended up using the oligo package, whicih provides a getNetAffx()
> >function to retrieve probe annotation data.
> >However, the existing normalization code precludes from switching to
> >the oligo package.
> >
> >Best regards,
> >David
> >
> >On Wed, May 18, 2011 at 01:51:01PM -0400, James W. MacDonald wrote:
> >>Hi David,
> >>
> >>On 5/18/2011 1:31 PM, David Shih wrote:
> >>>Dear list,
> >>>
> >>>I am using the affy package to normalize expression data on the HT
> >>>Human Genome U133A platform. According to earlier posting on the list,
> >>>this is equivalent to U133Av2.
> >>>
> >>>I used the justRMA() function to normalize the data, and obtained an
> >>>ExpressionSet object with an empty featureData slot.
> >>
> >>You may be misunderstanding the purpose of the featureData slot.
> >>According to ?eSet-class:
> >>
> >>'featureData': Contains variables describing features (i.e., rows
> >> in 'assayData') unique to this experiment. Use the
> >> 'annotation' slot to efficiently reference feature data
> >> common to the annotation package used in the experiment.
> >> Class: 'AnnotatedDataFrame-class'
> >>
> >>The probeset IDs are not unique to any particular experiment, so are
> >>stored as part of the assayData, which can be extracted using e.g.,
> >>exprs():
> >>
> >>>data(sample.ExpressionSet)
> >>>head(exprs(sample.ExpressionSet))
> >> A B C D E
> >>F G
> >>AFFX-MurIL2_at 192.7420 85.75330 176.7570 135.5750 64.49390
> >>76.3569 160.5050
> >>AFFX-MurIL10_at 97.1370 126.19600 77.9216 93.3713 24.39860
> >>85.5088 98.9086
> >>AFFX-MurIL4_at 45.8192 8.83135 33.0632 28.7072 5.94492
> >>28.2925 30.9694
> >>AFFX-MurFAS_at 22.5445 3.60093 14.6883 12.3397 36.86630
> >>11.2568 23.0034
> >>AFFX-BioB-5_at 96.7875 30.43800 46.1271 70.9319 56.17440
> >>42.6756 86.5156
> >>AFFX-BioB-M_at 89.0730 25.84610 57.2033 69.9766 49.58220
> >>26.1262 75.0083
> >>
> >>Best,
> >>
> >>Jim
> >>
> >>
> >>>
> >>>I understand that I can retrive the annotation information from:
> >>>library(hgu133a2.db)
> >>>
> >>>However, I am uncertain whether the probes in the annotation library
> >>>are in the same order as the ExpressionSet object.
> >>>
> >>>Can I add the probe IDs using the following code?
> >>>
> >>>library(hgu133a2.db)
> >>>keys<- mappedkeys(hgu133ai2ACCNUM);
> >>>expr<- exprs(eset);
> >>>rownames(expr)<- keys;
> >>>
> >>>
> >>>Best regards,
> >>>
> >>>David Shih, Graduate Student
> >>>The Hospital for Sick Children
> >>>Brain Tumour Research Centre
> >>>101 College Street, TMDT-11-401M
> >>>Toronto, ON M5G1L7
> >>>Canada
> >>>Tel: 416-813-7654 x4327
> >>>
> >>>
> >>>>sessionInfo()
> >>>R version 2.12.2 (2011-02-25)
> >>>Platform: x86_64-unknown-linux-gnu (64-bit)
> >>>
> >>>locale:
> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> >>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C
> >>>[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >>>
> >>>attached base packages:
> >>>[1] stats graphics grDevices utils datasets methods base
> >>>
> >>>other attached packages:
> >>>[1] hgu133a2.db_2.4.5 org.Hs.eg.db_2.4.6 RSQLite_0.9-4
> >>>[4] DBI_0.2-5 AnnotationDbi_1.12.0 hthgu133acdf_2.7.0
> >>>[7] affy_1.28.0 Biobase_2.10.0
> >>>
> >>>loaded via a namespace (and not attached):
> >>>[1] affyio_1.18.0 preprocessCore_1.12.0 tools_2.12.2
> >>>
> >>>_______________________________________________
> >>>Bioconductor mailing list
> >>>Bioconductor at r-project.org
> >>>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>--
> >>James W. MacDonald, M.S.
> >>Biostatistician
> >>Douglas Lab
> >>University of Michigan
> >>Department of Human Genetics
> >>5912 Buhl
> >>1241 E. Catherine St.
> >>Ann Arbor MI 48109-5618
> >>734-615-7826
> >>**********************************************************
> >>Electronic Mail is not secure, may not be read every day, and should
> >>not be used for urgent or sensitive issues
> >>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should
> not be used for urgent or sensitive issues
>
More information about the Bioconductor
mailing list