[BioC] problem with rat database
Marc Carlson
mcarlson at fhcrc.org
Tue May 10 20:38:41 CEST 2011
Hi Alberto,
So the way that the annotation packages work is that they take a probe
to gene mapping from a manufacturer and then return to you the relevant
gene information that is associated with the gene that is mapped to by a
specific probe.
If there was no mapping between the probe and the gene provided by the
manufacturer, we have not attempted to guess one for you.
But if you feel that you have better information about where these
probes map to (perhaps you took the time to align them to the genome and
see what genes are nearby as you did for this one here), then you could
supply that mapping to the SQLForge code in the AnnotationDbi package
and produce a new annotation package based on that. The details on how
to do this are described here in one of the vignettes from the
AnnotationDbi package
http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/inst/doc/SQLForge.pdf
I hope this clarifies things,
Marc
On 05/10/2011 05:34 AM, Alberto Goldoni wrote:
> @Davis
>
> You are right! But i have tryed to perform this kind of search:
>
> library("rgug4130a.db")
> x<- rgug4130aENSEMBL
> mapped_genes<- mappedkeys(x)
> xx<- as.list(x[mapped_genes])
>
> or this approach:
>
> x<- rgug4130aGENENAME
> mapped_probes<- mappedkeys(x)
> xx<- as.list(x[mapped_probes])
>
> but the results are the same in some genes there is:"unknown function".
>
> I would like to know if there is a method in order to perform the
> search using another database or directly to the Rat Genome Database
> or using biomaRt...but i don't know how.
> I have more or less 100 genes with an "unknown function" and it would
> be very useful if there is a script or function in order to perform
> automatically instead of serching genes one by one.
>
>
> Best regards.
>
> 2011/5/10 Sean Davis<sdavis2 at mail.nih.gov>:
>>
>> On Tue, May 10, 2011 at 8:17 AM, Alberto Goldoni
>> <alberto.goldoni1975 at gmail.com> wrote:
>>> @Vincent
>>>
>>> The chip used is the "rgug4130a" so i have to use the "rgug4130a.db"
>>> database.
>>>
>>> In order to obtain the toptable this is my history:
>>>
>>> library(limma)
>>> library(vsn)
>>> targets<- readTargets("targets.txt")
>>> RG<- read.maimages(targets$FileName, source="agilent")
>>> MA<- normalizeBetweenArrays(RG, method="Aquantile")
>>> contrast.matrix<-
>>>
>>> cbind("(hda+str)-(ref)"=c(1,0),"(ref+str)-(ref)"=c(0,1),"(hda+str)-(ref+str)"=c(1,-1))
>>> rownames(contrast.matrix)<- colnames(design)
>>> fit<- lmFit(MA, design)
>>> fit2<- contrasts.fit(fit, contrast.matrix)
>>> fit2<- eBayes(fit2)
>>> geni500<-topTable(fit2,number=500,adjust="BH")
>>>
>> Hi, Alberto.
>> The data in your topTable result are taken from the feature extraction
>> result file. In other words, rgug4130a.db is not used in what you show
>> above. You could add to your annotation using either rgug4130a.db or
>> biomaRt, but you will need to perform these steps yourself. As to why some
>> of your probes do not appear to have annotation, you would probably need to
>> contact Agilent as they are the source of your current annotation.
>> Hope that helps,
>> Sean
>>
>>>> sessionInfo()
>>> R version 2.12.1 (2010-12-16)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
>>> Kingdom.1252
>>> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>> [5] LC_TIME=English_United Kingdom.1252
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] AnnotationDbi_1.12.0 Biobase_2.10.0 limma_3.6.9
>>>
>>> loaded via a namespace (and not attached):
>>> [1] DBI_0.2-5 RSQLite_0.9-4 tools_2.12.1
>>>
>>>
>>>
>>> 2011/5/10 Vincent Carey<stvjc at channing.harvard.edu>:
>>>> 1) you did not provide sessionInfo(), which is critical for helping
>>>> you to diagnose an issue that may pertain to software version --
>>>> revisions to annotation packages can have all sorts of consequences
>>>>
>>>> 2) i am not sure rgug4130.db has anything to do with this.
>>>>
>>>>> get("CB606456", revmap(rgug4130aSYMBOL))
>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>> value for "CB606456" not found
>>>>
>>>>
>>>> and so on. look at the featureData component of the object passed to
>>>> lmFit -- the annotation may be in there. if this does not give
>>>> clarification please give very explicity indication of how the
>>>> topTable was generated, going back to the structure of the object
>>>> passed to lmFit
>>>>
>>>> On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni
>>>> <alberto.goldoni1975 at gmail.com> wrote:
>>>>> Dear All,
>>>>> i'm analyzing agilent microarrays with the "rgug4130a.db" database and
>>>>> using the function:"topTable(fit2,number=500,adjust="BH")" i have
>>>>> obtained 500 genes like these:
>>>>>
>>>>> Row Col ProbeUID ControlType ProbeName
>>>>> GeneName SystematicName Description X.hda.str...ref.
>>>>> X.ref.str...ref. X.hda.str...ref.str. AveExpr F P.Value
>>>>> adj.P.Val
>>>>> 16096 79 38 15309 0 A_43_P10328 CB606456
>>>>> CB606456 unknown
>>>>> function 3.988290607 -0.951656306 4.939946913
>>>>> 10.29735936 36.77263264 0.000212298 0.641094595
>>>>> 8109 40 109 7609 0 A_42_P552092 203358_Rn
>>>>> 203358_Rn Rat c-fos
>>>>> mRNA. 5.670956889 4.413365374 1.257591514 13.47699544
>>>>> 33.20342601 0.000292278 0.641094595
>>>>>
>>>>> but as you can see most genes like the first one - CB606456 - in the
>>>>> DESCRPTION there is written "unknown function".
>>>>>
>>>>> So i have performed a very simply search.
>>>>> 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations
>>>>> of DnaAlignFeature" it gives to me the Genomic location(strand): chr
>>>>> 7:16261621-16262210
>>>>> 2) Then in the Rat Genome Database
>>>>> (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found
>>>>> that in this position there is one gene:
>>>>>
>>>>> 735058 GENE Angptl4 angiopoietin-like 4 7 16261623
>>>>> 16267852
>>>>>
>>>>> so the question is why in the "rgug4130a.db" database the R system
>>>>> gives to me "unknown function" when using the genomic location in
>>>>> ensamble and then in rgd it gives to me the Angptl4 gene!
>>>>>
>>>>> and there is a function in order to do to R to perform this kind of
>>>>> search automatically? (this why in my 500 genes there are 100 "unknow
>>>>> function" genes and it will be interesting to have a function that
>>>>> perform this kind of search automatically).
>>>>>
>>>>>
>>>>> Best regards to all and to whom answer to me.
>>>>>
>>>>> --
>>>>> -----------------------------------------------------
>>>>> Dr. Alberto Goldoni
>>>>> Parma, Italy
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------
>>> Dr. Alberto Goldoni
>>> Parma, Italy
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
More information about the Bioconductor
mailing list