[BioC] GEOquery and Sample Subsets

Sean Davis sdavis2 at mail.nih.gov
Tue Jun 4 18:54:04 CEST 2013


On Tue, Jun 4, 2013 at 12:38 PM, Thomas H. Hampton
<Thomas.H.Hampton at dartmouth.edu> wrote:
> I am using to GEOquery to establish sample subsets of GEO data -- that is, I would
> like to know which samples are replicates.
>
> I am doing it something like this:
>
> gds505 <- getGEO("GDS505")
> Columns(gds505)
>
>> str(Columns(gds505))
> 'data.frame': 17 obs. of  4 variables:
>  $ sample       : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5 7 9 10 12 14 16 1 ...
>  $ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2 1 ...
>  $ individual   : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3 5 8 9 10 6 ...
>  $ description  : chr  "Value for GSM11814: C035 Renal Clear Cell Carcinoma U133A; src: Trizol...
>
> The problem I have is that the getGEO command retrieves a rather large object:
>
>> print(object.size(gds505), units="Mb")
> 12.6 Mb'
>
> This takes up a lot of time and bandwidth if you plan to do it for thousands of accessions.
>
> Is there a way to retrieve less?

Hi, Tom.  Are you saying that you really want just the metadata to
start; in other words, you just want the sample information without
the expression values?

Sean


> I am happy to use R, BioConductor, bioperl or whatever.
>
> Best,
>
> Tom
>
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list