[BioC] ragene10st

Thu Mar 5 23:30:09 CET 2009

Thanks to all those that offered advice. I have now managed to create an 
annotation file for rat gene ST arrays. It can be downloaded from:
http://sydneybioinformatics.org/download/ragene10st.db.rar
in case anyone else is interested in using it.

Sebastien

Hooiveld, Guido wrote:
> Hi Sebastien,
> To follow-up and clarify on Manhong remarks:
> Philip, my collegue, prepared the annotation files for many of the
> Entrez-based remapped CDF files. 
> The remapping of the probes has been done by Manhong et al @ the MBNI,
> and the mapped Entrez IDs are then used by Philip to create the
> corresponding annotation files (using the annotation/SQLForge library),
> that are made available trough the link you mentioned below.
>
> HTH,
> Guido
>
>  
>
>   
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch 
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of 
>> Manhong Dai
>> Sent: 03 March 2009 15:36
>> To: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] ragene10st
>>
>> Hi Sebastien,
>>
>>
>> 	Custom CDF version 11 is at
>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo
>> mCDF/CDF_download.asp#v11
>>
>> 	If you prefer entrez gene based cdf, it is at 
>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/Custo
>> mCDF/11.0.1/entrezg.asp then search RaGene10stv1 in the page.
>>
>>
>> 	In custom CDF entrezg, the probeset id is already 
>> entrez gene. That's why you saw the probeset ID in NUGO 
>> Custom CDF version 10 annotation package is not the same as 
>> the probeset id in affy's original custom CDF file.
>>
>>
>> Best,
>> Manhong
>>
>>     
>>> Date: Tue, 03 Mar 2009 16:08:33 +1100
>>> From: Sebastien Gerega <seb at gerega.net>
>>> Subject: Re: [BioC] ragene10st
>>> To: bioconductor at stat.math.ethz.ch
>>> Message-ID: <49ACBB51.8070904 at gerega.net>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> Thank you Marc and Manhong for your suggestions.
>>> I have attempted both methods and run into some problems. 
>>>       
>> Firstly, I 
>>     
>>> was able to build ragene10st.db using the following code:
>>>
>>> source("http://bioconductor.org/biocLite.R")
>>> biocLite("rat.db0")
>>>
>>> library(AnnotationDbi)
>>> fname = "RaGene-1_0-st-v1.EDITED.txt"
>>> wdir = getwd()   
>>> makeRATCHIP_DB(affy=FALSE,
>>>     prefix="ragene10st",
>>>     fileName=fname,
>>>     baseMapType="eg",
>>>     outputDir = wdir,
>>>     version="1.0.0",
>>>     manufacturer = "Affymetrix",
>>>     chipName = "Rat Gene ST Array",
>>>     manufacturerUrl = "http://www.affymetrix.com")
>>>
>>> I then used this library for annotation of an analysis I 
>>>       
>> performed. At 
>>     
>>> this point I realised that about one third of the 29171 probes were 
>>> assigned the gene symbol "RT1-C113". I realise this is due to the 
>>> annotation file used being in the wrong format. I had used the 
>>> "mrna_assignment" column which contains data appearing in a complex 
>>> format. Here are a couple examples:
>>> NM_001099458 // RefSeq // Rattus norvegicus similar to putative 
>>> pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 
>>>       
>> // 19 // 39 
>>     
>>> // 0 ///
>>> ENSRNOT00000046204 // Rn.217623 // ---
>>> NM_001099461 // Rn.217622 // --- /// NM_001099461 // 
>>>       
>> Rn.217622 // --- 
>>     
>>> /// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 //
>>> Rn.217623 // ---
>>>
>>> Unfortunately for the Gene ST chips there are no columns 
>>>       
>> that simply 
>>     
>>> contain genbank, unigene, or refseq IDs.
>>>
>>> So instead I tried Manhong's suggestion of using a custom CDF but 
>>> there is no custom CDF for rat gene ST arrays on the 
>>> http://brainarray.mbni.med.umich.edu/ website. However, if I follow 
>>> the link to http://nugo-r.bioinformatics.nl/NuGO_R.html I 
>>>       
>> am able to 
>>     
>>> locate an appropriate CDF. Unfortunately, upon further 
>>>       
>> examination of 
>>     
>>> this CDF package it appears as though the wrong probe IDs 
>>>       
>> have been used.
>>     
>>> For example:
>>>  > as.list(ragene10stv1rnentrezgSYMBOL)[1:5]
>>> $`112400_at`
>>> [1] "Nrg1"
>>>
>>> $`113882_at`
>>> [1] "Hemgn"
>>>
>>> $`113886_at`
>>> [1] "Kif1c"
>>>
>>> $`113892_at`
>>> [1] "Cml3"
>>>
>>> As far as I am aware the probe IDs used for rat gene ST 
>>>       
>> arrays are in 
>>     
>>> the following format (8 digits without "_at"):
>>> 10700001
>>> 10700003
>>> 10700004
>>> 10700005
>>> 10700013
>>>
>>> Can anyone provide any advice for either of the two options?
>>> thanks,
>>> Sebastien
>>>       
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>     
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>