[BioC] GEOquery: getGEO() doesn\'t work (error \"invalid \'nlines\' argument\")
ecsi at gmx.net
ecsi at gmx.net
Tue May 29 16:17:33 CEST 2012
Hi Sean,
> The "system.file" part of your command above is not necessary (and is
> probably the problem). System.file is for locating files that came
> with a specific software package. So, you want something like:
>
> GSE19711 <- getGEO('mypath/GSE19711_family.soft.gz')
This works! Thanks a lot!
> Note that you will have to do a fair bit of work to get the data out
> of a SOFT format file. Instead, you should consider using a GSEMatrix
> file. Alternatively, download the raw data and use a
> platform-appropriate package to read in and analyze the data.
> Finally, note that you do not need to download files separately.
Well, my problem is that I am not quite sure about the "best" way to get
the data I need. I'll try to give an example:
We have the GEO Series GSE19711. For all the samples of this series, I
need some specific information. Let's use the first sample of GSE19711
as an example: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM491937
I need to know the age of the patient ("ageatdiagnosis", if it is a pre-
or a post-treatment sample, and the sex of the patient (in this case all
samples are from women) and maybe some other information (in case of
other series). And of course, I need the data matrix itself, to be
finally able to create something similar to an ExpressionSet, but using
the methylumi package, because all this is about methylation and not
gene expression.
I have to deal with several thousand samples from many different GEO
series, therefore I want to automate the fetching of the phenodata
information of the patients. Searching for a solution to do this, I
found the GEOquery package and I thought it would be the best way to
deal with the soft-Files because these files are available for all
series I want to analyze, and they contain all information available, I
thought (so far I worked only with expression data where I used RAW
files, but there were always also phenodata files available, so it was a
lot easier).
If you can think of any better way to get the data I need and to
annotate the sample <-> phenodata information in an easy way, please
tell me, I would be very happy.
Simone
More information about the Bioconductor
mailing list