[BioC] Adding annotations to GSE datasets
Sean Davis
sdavis2 at mail.nih.gov
Thu May 8 16:15:12 CEST 2014
On Thu, May 8, 2014 at 8:21 AM, Marcelo Pereira <marcelops at gmail.com> wrote:
> That is all because I am interested in the expression values for some pairs
> of genes.
>
> If I had something like this:
>
> GSM278765 GSM278766 GSM278767 ...
> A1BG 5.459950 5.548725 5.477436 ...
> NAT2 6.728919 6.329578 6.570104 ...
> ADA 6.861095 7.005730 7.235361 ...
> CDH2 9.660035 9.189507 9.740223 ...
> ... 5.644313 5.898675 5.475838 ...
> ... 7.838040 7.564335 8.397569 ...
>
> Then I could extract lines for the genes of interest (for example, 'A1BG'
> and 'ADA'), and then plot scatterplots, compute correlation coefficients,
> etc...
Something like this might work:
plot(exprs(gset[[1]])[fData(gset[[1]])$Gene=='A1BG',])
Sean
> The name of the genes for each line is the only detail that is not present
> in my dataset.
>
> What am I missing here?
>
> Thanks,
> Marcelo
>
>
>
> On Thu, May 8, 2014 at 7:42 AM, Marcelo Pereira <marcelops at gmail.com> wrote:
>>
>> Hello Sean,
>>
>> Thanks for your replies.
>>
>> I used to download all the CEL files, and then load, normalize and
>> generate the ExpressionSet output. All manually, and it was working fine!
>>
>> Then I found out about doing it automatically using the GEOquery library.
>> And this is what have been taking my hours lately.
>>
>> The output of exprs(gset[[1]]) is the initial point where I got stuck
>> after a few minutes using the GEOquery library, because I have the
>> expression, but not the gene's names.
>>
>> GSM278765 GSM278766 GSM278767 ...
>> 1 5.459950 5.548725 5.477436 ...
>> 10 6.728919 6.329578 6.570104 ...
>> 100 6.861095 7.005730 7.235361 ...
>> 1000 9.660035 9.189507 9.740223 ...
>> 10000 5.644313 5.898675 5.475838 ...
>> 10001 7.838040 7.564335 8.397569 ...
>>
>> After that, I tried to manipulate the output in order to translate 1, 10,
>> 100, 1000, to the actual names of the genes. And my last resource was to
>> ask here at the forum.
>>
>> It is looking good already. I only need to have an extra column, with the
>> names of the genes.
>>
>> Thanks,
>> Marcelo
>>
>>
>> On Thu, May 8, 2014 at 7:14 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>>>
>>> On Thu, May 8, 2014 at 6:58 AM, Marcelo Pereira <marcelops at gmail.com>
>>> wrote:
>>> > Hi Sean,
>>> >
>>> > Thanks for your answer!
>>> >
>>> > That is great already.
>>> >
>>> > I can see the gene's names now:
>>> >
>>> >> library(GEOquery)
>>> >> gset <- getGEO("GSE11024", GSEMatrix=TRUE, AnnotGPL=TRUE)
>>> >> head(fData(gset[[1]]))$Gene
>>> > [1] A1BG NAT2 ADA CDH2 AKT3 MED6
>>> > 17098 Levels: A1BG ABCB6 ABCC5 ABCC9 ABCF2 ABI1 ACOT8 ACTR2 ACTR3 ADA
>>> > ADAM8 AKT3 ... ZNF254
>>> >
>>> > But the data frame only contains these columns.
>>> >
>>> >> names(fData(gset[[1]]))
>>> > [1] "ID" "Gene" "UniGene" "Description"
>>> > "Ensembl*
>>> > Chr" "Start (bp)"
>>> > [7] "End (bp)" "Strand" "ORF" "SPOT_ID"
>>> >
>>> > Where is the expression information for each gene?
>>>
>>> exprs(gset[[1]])
>>>
>>> gset is an ExpressionSet, so you should read a bit about
>>> ExpressionSets in the Biobase vignette as well as the help page.
>>>
>>> Sean
>>>
>>>
>>> >
>>> > Thanks,
>>> > Marcelo
>>> >
>>> >
>>> >
>>> > On Thu, May 8, 2014 at 6:24 AM, Sean Davis <sdavis2 at mail.nih.gov>
>>> > wrote:
>>> >
>>> >> Hi, Marcelo.
>>> >>
>>> >>
>>> >> On Wed, May 7, 2014 at 8:01 PM, Marcelo Pereira <marcelops at gmail.com>
>>> >> wrote:
>>> >> > Quick question:
>>> >> >
>>> >> > I am trying to import some GEO datasets, and having some issues with
>>> >> > the
>>> >> > annotations:
>>> >> >
>>> >> > I can download the GSE dataset using:
>>> >> >
>>> >> > gset <- getGEO("GSE11024", GSEMatrix=TRUE, AnnotGPL=TRUE)
>>> >> >
>>> >> >
>>> >> > However, it will return me a ExpressionSet with the following
>>> >> > format:
>>> >> >
>>> >> > X1 X10 X100 X1000 ...
>>> >> > GSM278765
>>> >> > GSM278766
>>> >> > GSM278767
>>> >> > GSM278768
>>> >> > GSM278769
>>> >> > ...
>>> >>
>>> >> This is not what is returned by GEOquery, so you have done some
>>> >> manipulation (looks like you did a transpose on the expression
>>> >> matrix), it seems.
>>> >>
>>> >> > This is pretty much what I need, but I still need to translate (X1,
>>> >> > X10,
>>> >> > X100, X1000, etc...) to the actual names of the genes.
>>> >>
>>> >> library(GEOquery)
>>> >> gset <- getGEO("GSE11024", GSEMatrix=TRUE, AnnotGPL=TRUE)[[1]]
>>> >> head(fData(gset))
>>> >>
>>> >> The gene symbols are in the "Gene" column:
>>> >>
>>> >> genesymbols = fData(gset)$Gene
>>> >>
>>> >> Sean
>>> >>
>>> >>
>>> >> >
>>> >> > Any suggestions?
>>> >> >
>>> >> > Thanks,
>>> >> > Marcelo
>>> >> >
>>> >> > [[alternative HTML version deleted]]
>>> >> >
>>> >> > _______________________________________________
>>> >> > Bioconductor mailing list
>>> >> > Bioconductor at r-project.org
>>> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >> > Search the archives:
>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >>
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > _______________________________________________
>>> > Bioconductor mailing list
>>> > Bioconductor at r-project.org
>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> > Search the archives:
>>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
More information about the Bioconductor
mailing list