[BioC] ragene10st
Sebastien Gerega
seb at gerega.net
Thu Mar 5 23:30:09 CET 2009
Thanks to all those that offered advice. I have now managed to create an
annotation file for rat gene ST arrays. It can be downloaded from:
http://sydneybioinformatics.org/download/ragene10st.db.rar
in case anyone else is interested in using it.
Sebastien
Hooiveld, Guido wrote:
> Hi Sebastien,
> To follow-up and clarify on Manhong remarks:
> Philip, my collegue, prepared the annotation files for many of the
> Entrez-based remapped CDF files.
> The remapping of the probes has been done by Manhong et al @ the MBNI,
> and the mapped Entrez IDs are then used by Philip to create the
> corresponding annotation files (using the annotation/SQLForge library),
> that are made available trough the link you mentioned below.
>
> HTH,
> Guido
>
>
>
>
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of
>> Manhong Dai
>> Sent: 03 March 2009 15:36
>> To: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] ragene10st
>>
>> Hi Sebastien,
>>
>>
>> Custom CDF version 11 is at
>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo
>> mCDF/CDF_download.asp#v11
>>
>> If you prefer entrez gene based cdf, it is at
>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo
>> mCDF/11.0.1/entrezg.asp then search RaGene10stv1 in the page.
>>
>>
>> In custom CDF entrezg, the probeset id is already
>> entrez gene. That's why you saw the probeset ID in NUGO
>> Custom CDF version 10 annotation package is not the same as
>> the probeset id in affy's original custom CDF file.
>>
>>
>> Best,
>> Manhong
>>
>>
>>> Date: Tue, 03 Mar 2009 16:08:33 +1100
>>> From: Sebastien Gerega <seb at gerega.net>
>>> Subject: Re: [BioC] ragene10st
>>> To: bioconductor at stat.math.ethz.ch
>>> Message-ID: <49ACBB51.8070904 at gerega.net>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> Thank you Marc and Manhong for your suggestions.
>>> I have attempted both methods and run into some problems.
>>>
>> Firstly, I
>>
>>> was able to build ragene10st.db using the following code:
>>>
>>> source("http://bioconductor.org/biocLite.R")
>>> biocLite("rat.db0")
>>>
>>> library(AnnotationDbi)
>>> fname = "RaGene-1_0-st-v1.EDITED.txt"
>>> wdir = getwd()
>>> makeRATCHIP_DB(affy=FALSE,
>>> prefix="ragene10st",
>>> fileName=fname,
>>> baseMapType="eg",
>>> outputDir = wdir,
>>> version="1.0.0",
>>> manufacturer = "Affymetrix",
>>> chipName = "Rat Gene ST Array",
>>> manufacturerUrl = "http://www.affymetrix.com")
>>>
>>> I then used this library for annotation of an analysis I
>>>
>> performed. At
>>
>>> this point I realised that about one third of the 29171 probes were
>>> assigned the gene symbol "RT1-C113". I realise this is due to the
>>> annotation file used being in the wrong format. I had used the
>>> "mrna_assignment" column which contains data appearing in a complex
>>> format. Here are a couple examples:
>>> NM_001099458 // RefSeq // Rattus norvegicus similar to putative
>>> pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74
>>>
>> // 19 // 39
>>
>>> // 0 ///
>>> ENSRNOT00000046204 // Rn.217623 // ---
>>> NM_001099461 // Rn.217622 // --- /// NM_001099461 //
>>>
>> Rn.217622 // ---
>>
>>> /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 //
>>> Rn.217623 // ---
>>>
>>> Unfortunately for the Gene ST chips there are no columns
>>>
>> that simply
>>
>>> contain genbank, unigene, or refseq IDs.
>>>
>>> So instead I tried Manhong's suggestion of using a custom CDF but
>>> there is no custom CDF for rat gene ST arrays on the
>>> http://brainarray.mbni.med.umich.edu/ website. However, if I follow
>>> the link to http://nugo-r.bioinformatics.nl/NuGO_R.html I
>>>
>> am able to
>>
>>> locate an appropriate CDF. Unfortunately, upon further
>>>
>> examination of
>>
>>> this CDF package it appears as though the wrong probe IDs
>>>
>> have been used.
>>
>>> For example:
>>> > as.list(ragene10stv1rnentrezgSYMBOL)[1:5]
>>> $`112400_at`
>>> [1] "Nrg1"
>>>
>>> $`113882_at`
>>> [1] "Hemgn"
>>>
>>> $`113886_at`
>>> [1] "Kif1c"
>>>
>>> $`113892_at`
>>> [1] "Cml3"
>>>
>>> As far as I am aware the probe IDs used for rat gene ST
>>>
>> arrays are in
>>
>>> the following format (8 digits without "_at"):
>>> 10700001
>>> 10700003
>>> 10700004
>>> 10700005
>>> 10700013
>>>
>>> Can anyone provide any advice for either of the two options?
>>> thanks,
>>> Sebastien
>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list