I am using to GEOquery to establish sample subsets of GEO data -- that is, I would
like to know which samples are replicates.

I am doing it something like this:

gds505 <- getGEO("GDS505")
Columns(gds505)

> str(Columns(gds505))
'data.frame': 17 obs. of  4 variables:
 $ sample       : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5 7 9 10 12 14 16 1 ...
 $ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2 1 ...
 $ individual   : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3 5 8 9 10 6 ...
 $ description  : chr  "Value for GSM11814: C035 Renal Clear Cell Carcinoma U133A; src: Trizol...

The problem I have is that the getGEO command retrieves a rather large object:

> print(object.size(gds505), units="Mb")
12.6 Mb'

This takes up a lot of time and bandwidth if you plan to do it for thousands of accessions.

Is there a way to retrieve less?

I am happy to use R, BioConductor, bioperl or whatever.

Best,

Tom


	[[alternative HTML version deleted]]