[BioC] ragene10st
Sebastien Gerega
seb at gerega.net
Tue Mar 3 06:08:33 CET 2009
Thank you Marc and Manhong for your suggestions.
I have attempted both methods and run into some problems. Firstly, I was
able to build ragene10st.db using the following code:
source("http://bioconductor.org/biocLite.R")
biocLite("rat.db0")
library(AnnotationDbi)
fname = "RaGene-1_0-st-v1.EDITED.txt"
wdir = getwd()
makeRATCHIP_DB(affy=FALSE,
prefix="ragene10st",
fileName=fname,
baseMapType="eg",
outputDir = wdir,
version="1.0.0",
manufacturer = "Affymetrix",
chipName = "Rat Gene ST Array",
manufacturerUrl = "http://www.affymetrix.com")
I then used this library for annotation of an analysis I performed. At
this point I realised that about one third of the 29171 probes were
assigned the gene symbol "RT1-C113". I realise this is due to the
annotation file used being in the wrong format. I had used the
"mrna_assignment" column which contains data appearing in a complex
format. Here are a couple examples:
NM_001099458 // RefSeq // Rattus norvegicus similar to putative
pheromone receptor (RGD1564110), mRNA. // chr1 // 49 // 74 // 19 // 39
// 0 ///
ENSRNOT00000046204 // Rn.217623 // ---
NM_001099461 // Rn.217622 // --- /// NM_001099461 // Rn.217622 // ---
/// ENSRNOT00000041455 // Rn.217622 // --- /// ENSRNOT00000046204 //
Rn.217623 // ---
Unfortunately for the Gene ST chips there are no columns that simply
contain genbank, unigene, or refseq IDs.
So instead I tried Manhong's suggestion of using a custom CDF but there
is no custom CDF for rat gene ST arrays on the
http://brainarray.mbni.med.umich.edu/ website. However, if I follow the
link to http://nugo-r.bioinformatics.nl/NuGO_R.html I am able to locate
an appropriate CDF. Unfortunately, upon further examination of this CDF
package it appears as though the wrong probe IDs have been used.
For example:
> as.list(ragene10stv1rnentrezgSYMBOL)[1:5]
$`112400_at`
[1] "Nrg1"
$`113882_at`
[1] "Hemgn"
$`113886_at`
[1] "Kif1c"
$`113892_at`
[1] "Cml3"
As far as I am aware the probe IDs used for rat gene ST arrays are in
the following format (8 digits without "_at"):
10700001
10700003
10700004
10700005
10700013
Can anyone provide any advice for either of the two options?
thanks,
Sebastien
Marc Carlson wrote:
> Well one way is to navigate Affymetrix's website and grab the annotation
> file
>
> http://www.affymetrix.com/support/technical/annotationfilesmain.affx
>
> Or you could also use Martin Morgans clever AffyCompatible package which
> will let you get the data you need more directly.
>
> ##The 2nd approach would go something like this (adapting from Martins
> Vignette):
> library(AffyCompatible)
> password <- "your_psswd"
> rsrc <- NetAffxResource(user="you at someplace.com", password=password)
> head(names(rsrc))
> affxDescription(rsrc[["RaGene-1_0-st-v1"]])
> annos <- rsrc[["RaGene-1_0-st-v1"]]
> annos
> sapply(affxAnnotation(annos), force)
> anno <- rsrc[["RaGene-1_0-st-v1", "Probeset Annotations, CSV Format"]]
> fl <- readAnnotation(rsrc, annotation=anno, content=FALSE)
> fl
> conn <- unz(fl, "RaGene-1_0-st-v1.na27.2.rn4.probeset.csv")
> ##Then get a dataframe with the contents of the file in it
> df = read.table(conn, header=TRUE, skip=18, sep=",")
>
>
>
> Marc
>
>
>
>
> Sebastien Gerega wrote:
>
>> Hi Marc,
>> I guess the problem lies in the fact that I don't know which
>> Annotation file to use. I can't seem to find any that have the
>> appropriate columns. What files were used to generate mogene10st.db
>> and hugene10st.db ? I can find appropriate annotations for Affy 3'
>> arrays but not for the Gene St ones....
>> thanks again,
>> Sebastien
>>
>>
>> Marc Carlson wrote:
>>
>>> Hi Sebastien,
>>>
>>> The affy parameter is just a shortcut for affymetrix expression
>>> arrays. If you want to use that parameter, then you can download the
>>> appropriate
>>> annotation library file from Affymetrix website (which you probably have
>>> to get anyhow), just point to it in the parameter and then call the
>>> function. What SQLforge will then try to do is to parse this file by
>>> removing from it only the probeset IDs and the entrez gene, refseq IDs
>>> and unigene IDs from the file in order to sort out what all these genes
>>> are and thus generate the files that are described in the vignette from
>>> this affymetrix file. This will work as long as this particular
>>> annotation file is formatted similarly to what has come before. But,
>>> really this parameter is purely for convenience and not at all necessary
>>> to using SQLForge. A lot of people use affy, so I just added this to
>>> make it easier for that majority of users.
>>> You almost as easily can just grab that same Affymetrix annotation
>>> library file and make the tab delimited files that I described
>>> yourself. All you really need is a file that tells the gene identity of
>>> the different probesets. So you can ignore the vast majority of the
>>> data in the file. If you have that, then you have all that you really
>>> need to proceed. For most platforms this just means selecting out tow
>>> of the columns and then creating a tab file from those. Then you have
>>> to feed such a file to your function.
>>>
>>> Please let me know if you have more questions,
>>>
>>>
>>> Marc
>>>
>>>
>>>
>>>
>>>
>>> Sebastien Gerega wrote:
>>>
>>>
>>>> Hi Marc and thanks for your help. I've had a look at the SQLForge
>>>> vignette and there are still a couple issues that are unclear to me.
>>>> Firstly, for the Rat Gene ST arrays is it possible to use any of the
>>>> annotation files from the Affymetrix site as input for makeRATCHIP_DB
>>>> in AnnotationDbi? If not, and the list of probes has to be manually
>>>> created what is the best way to go about doing this?
>>>> thanks again,
>>>> Sebastien
>>>>
>>>>
>>>> Marc Carlson wrote:
>>>>
>>>>
>>>>> Hi Sebastien,
>>>>>
>>>>> We have just never had anyone ask for one before. However, you can
>>>>> make
>>>>> a package for yourself if you follow the instructions in the SQLForge
>>>>> vignette in the AnnotationDbi package:
>>>>>
>>>>> http://www.bioconductor.org/packages/devel/bioc/html/AnnotationDbi.html
>>>>>
>>>>>
>>>>> Please let me know if you have further questions regarding this.
>>>>>
>>>>> Marc
>>>>>
>>>>>
>>>>> Sebastien Gerega wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>> I have been analysing human and mouse gene ST chips using a
>>>>>> combination of the Aroma package and the hugene10st.db and
>>>>>> mogene10st.db annotation packages. Now I am attempting to perform the
>>>>>> same on some rat gene ST chips but have unable to find the
>>>>>> corresponding annotations. Why is there no ragene10st?
>>>>>> thanks,
>>>>>> Sebastien
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
>
More information about the Bioconductor
mailing list