[BioC] GEOquery: how to extract experimental data? (confused)
J.delasHeras at ed.ac.uk
J.delasHeras at ed.ac.uk
Tue Aug 16 14:25:17 CEST 2011
Quoting Sean Davis <sdavis2 at mail.nih.gov> on Tue, 16 Aug 2011 07:36:41 -0400:
> On Tue, Aug 16, 2011 at 7:20 AM, <J.delasHeras at ed.ac.uk> wrote:
>>
>> I have been until now downloading GEO data directly to my computer and using
>> basic R functions to load tables and process them.
>> It works, but I figured I would probably save time if I learn to use the
>> GEOquery package, which looks promising.
>>
>> However, I'm failing tremendously at my first attempt. I can get a lot of
>> good information, except the actual experiment data... and it seems to be
>> there, but can't get to it!
>>
>> Example. I'm trying to get GSE19044, which contains 42 samples and uses the
>> Illumina WG6 platform, which is great as I'm familiar with it.
>>
>> so I do:
>>
>> library(GEOquery)
>> u = getGEO('GSE19044')
>> show(u)
>>
>>> show(u)
>>
>> $GSE19044_series_matrix.txt.gz
>> ExpressionSet (storageMode: lockedEnvironment)
>> assayData: 45281 features, 42 samples
>> element names: exprs
>> protocolData: none
>> phenoData
>> sampleNames: GSM471318, GSM471319, ..., GSM471359 (42 total)
>> varLabels and varMetadata description:
>> title: NA
>> geo_accession: NA
>> ...: ...
>> data_row_count: NA
>> (39 total)
>> featureData
>> featureNames: ILMN_1212602, ILMN_1212603, ..., ILMN_3163582 (45281 total)
>> fvarLabels and fvarMetadata description:
>> ID: NA
>> Species: NA
>> ...: ...
>> SPOT_ID: NA
>> (31 total)
>> additional fvarMetadata: Column, Description
>> experimentData: use 'experimentData(object)'
>> Annotation: GPL6887
>>
>> It looks good. It looks like what I want is the 'assayData'. But I can't get
>> to it.
>>
>> 'u' is a list, containing one element...
>>>
>>> class(u)
>>
>> [1] "list"
>>>
>>> length(u)
>>
>> [1] 1
>>
>>> class(u[[1]])
>>
>> [1] "ExpressionSet"
>> attr(,"package")
>> [1] "Biobase"
>>
>> ok, so I rename that, and look at its structure:
>>
>> eset<-u[[1]]
>> str(eset)
>>
>>> str(eset)
>>
>> Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
>> ..@ assayData :<environment: 0x0645ec5c>
>> ..@ phenoData :Formal class 'AnnotatedDataFrame' [package "Biobase"]
>> [...] (omitted for brevity)
>>
>> I can extract the sample names, the basic annotation/probe identity etc
>> easily:
>> eset at phenoData@data #samples
>> eset at featureData@data #annotation
>>
>> but how do I get into 'assayData'?
>> from the 'show(u)' it looks like it contains what I am after: 45281
>> features, 42 samples ... but it's class 'environment' and that's throwing me
>> off.
>>
>> I was looking into the GEOquery user guide, but I'm still none the wiser.
>
> Hi, Jose.
>
> Sorry this was confusing for you. Your eset object above is an
> ExpressionSet and is one of the standard classes for storing gene
> expression data in Bioconductor; GEOquery uses this class where
> possible to store GEO data so as to facilitate downstream processing
> with other Bioconductor packages. Typically, you can get the
> expression data from an ExpressionSet by doing:
>
> assayDataElement(eset,'exprs')
>
> or the simpler shorthand:
>
> exprs(eset)
>
> Similarly, to get the sample variables, you can do:
>
> pData(eset)
>
> To get more help on ExpressionSet, you can do
> help("ExpressionSet-class") and read the related Biobase vignette.
>
> I hope that clears things up.
>
> Sean
>
>
just like that! ha! thank you very much for that!
I never used the ExpressionSet class before and I assumed that I could
simply just access its contents directly by brute force, indicating
the right slot/component... I'll check the info on ExpressionSet for
its characteristics.
thank you!
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6507090
Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the Bioconductor
mailing list