[BioC] problem with rat database
Alberto Goldoni
alberto.goldoni1975 at gmail.com
Tue May 10 23:35:40 CEST 2011
Very clear.
Thanks Marc.
2011/5/10 Marc Carlson <mcarlson at fhcrc.org>:
> Hi Alberto,
>
> So the way that the annotation packages work is that they take a probe to
> gene mapping from a manufacturer and then return to you the relevant gene
> information that is associated with the gene that is mapped to by a specific
> probe.
>
> If there was no mapping between the probe and the gene provided by the
> manufacturer, we have not attempted to guess one for you.
>
> But if you feel that you have better information about where these probes
> map to (perhaps you took the time to align them to the genome and see what
> genes are nearby as you did for this one here), then you could supply that
> mapping to the SQLForge code in the AnnotationDbi package and produce a new
> annotation package based on that. The details on how to do this are
> described here in one of the vignettes from the AnnotationDbi package
>
> http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/inst/doc/SQLForge.pdf
>
> I hope this clarifies things,
>
>
> Marc
>
>
>
> On 05/10/2011 05:34 AM, Alberto Goldoni wrote:
>>
>> @Davis
>>
>> You are right! But i have tryed to perform this kind of search:
>>
>> library("rgug4130a.db")
>> x<- rgug4130aENSEMBL
>> mapped_genes<- mappedkeys(x)
>> xx<- as.list(x[mapped_genes])
>>
>> or this approach:
>>
>> x<- rgug4130aGENENAME
>> mapped_probes<- mappedkeys(x)
>> xx<- as.list(x[mapped_probes])
>>
>> but the results are the same in some genes there is:"unknown function".
>>
>> I would like to know if there is a method in order to perform the
>> search using another database or directly to the Rat Genome Database
>> or using biomaRt...but i don't know how.
>> I have more or less 100 genes with an "unknown function" and it would
>> be very useful if there is a script or function in order to perform
>> automatically instead of serching genes one by one.
>>
>>
>> Best regards.
>>
>> 2011/5/10 Sean Davis<sdavis2 at mail.nih.gov>:
>>>
>>> On Tue, May 10, 2011 at 8:17 AM, Alberto Goldoni
>>> <alberto.goldoni1975 at gmail.com> wrote:
>>>>
>>>> @Vincent
>>>>
>>>> The chip used is the "rgug4130a" so i have to use the "rgug4130a.db"
>>>> database.
>>>>
>>>> In order to obtain the toptable this is my history:
>>>>
>>>> library(limma)
>>>> library(vsn)
>>>> targets<- readTargets("targets.txt")
>>>> RG<- read.maimages(targets$FileName, source="agilent")
>>>> MA<- normalizeBetweenArrays(RG, method="Aquantile")
>>>> contrast.matrix<-
>>>>
>>>>
>>>> cbind("(hda+str)-(ref)"=c(1,0),"(ref+str)-(ref)"=c(0,1),"(hda+str)-(ref+str)"=c(1,-1))
>>>> rownames(contrast.matrix)<- colnames(design)
>>>> fit<- lmFit(MA, design)
>>>> fit2<- contrasts.fit(fit, contrast.matrix)
>>>> fit2<- eBayes(fit2)
>>>> geni500<-topTable(fit2,number=500,adjust="BH")
>>>>
>>> Hi, Alberto.
>>> The data in your topTable result are taken from the feature extraction
>>> result file. In other words, rgug4130a.db is not used in what you show
>>> above. You could add to your annotation using either rgug4130a.db or
>>> biomaRt, but you will need to perform these steps yourself. As to why
>>> some
>>> of your probes do not appear to have annotation, you would probably need
>>> to
>>> contact Agilent as they are the source of your current annotation.
>>> Hope that helps,
>>> Sean
>>>
>>>>> sessionInfo()
>>>>
>>>> R version 2.12.1 (2010-12-16)
>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
>>>> Kingdom.1252
>>>> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=English_United Kingdom.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] AnnotationDbi_1.12.0 Biobase_2.10.0 limma_3.6.9
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] DBI_0.2-5 RSQLite_0.9-4 tools_2.12.1
>>>>
>>>>
>>>>
>>>> 2011/5/10 Vincent Carey<stvjc at channing.harvard.edu>:
>>>>>
>>>>> 1) you did not provide sessionInfo(), which is critical for helping
>>>>> you to diagnose an issue that may pertain to software version --
>>>>> revisions to annotation packages can have all sorts of consequences
>>>>>
>>>>> 2) i am not sure rgug4130.db has anything to do with this.
>>>>>
>>>>>> get("CB606456", revmap(rgug4130aSYMBOL))
>>>>>
>>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>>> value for "CB606456" not found
>>>>>
>>>>>
>>>>> and so on. look at the featureData component of the object passed to
>>>>> lmFit -- the annotation may be in there. if this does not give
>>>>> clarification please give very explicity indication of how the
>>>>> topTable was generated, going back to the structure of the object
>>>>> passed to lmFit
>>>>>
>>>>> On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni
>>>>> <alberto.goldoni1975 at gmail.com> wrote:
>>>>>>
>>>>>> Dear All,
>>>>>> i'm analyzing agilent microarrays with the "rgug4130a.db" database and
>>>>>> using the function:"topTable(fit2,number=500,adjust="BH")" i have
>>>>>> obtained 500 genes like these:
>>>>>>
>>>>>> Row Col ProbeUID ControlType ProbeName
>>>>>> GeneName SystematicName Description X.hda.str...ref.
>>>>>> X.ref.str...ref. X.hda.str...ref.str. AveExpr F
>>>>>> P.Value
>>>>>> adj.P.Val
>>>>>> 16096 79 38 15309 0 A_43_P10328 CB606456
>>>>>> CB606456 unknown
>>>>>> function 3.988290607 -0.951656306 4.939946913
>>>>>> 10.29735936 36.77263264 0.000212298 0.641094595
>>>>>> 8109 40 109 7609 0 A_42_P552092 203358_Rn
>>>>>> 203358_Rn Rat c-fos
>>>>>> mRNA. 5.670956889 4.413365374 1.257591514 13.47699544
>>>>>> 33.20342601 0.000292278 0.641094595
>>>>>>
>>>>>> but as you can see most genes like the first one - CB606456 - in the
>>>>>> DESCRPTION there is written "unknown function".
>>>>>>
>>>>>> So i have performed a very simply search.
>>>>>> 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations
>>>>>> of DnaAlignFeature" it gives to me the Genomic location(strand): chr
>>>>>> 7:16261621-16262210
>>>>>> 2) Then in the Rat Genome Database
>>>>>> (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found
>>>>>> that in this position there is one gene:
>>>>>>
>>>>>> 735058 GENE Angptl4 angiopoietin-like 4 7 16261623
>>>>>> 16267852
>>>>>>
>>>>>> so the question is why in the "rgug4130a.db" database the R system
>>>>>> gives to me "unknown function" when using the genomic location in
>>>>>> ensamble and then in rgd it gives to me the Angptl4 gene!
>>>>>>
>>>>>> and there is a function in order to do to R to perform this kind of
>>>>>> search automatically? (this why in my 500 genes there are 100 "unknow
>>>>>> function" genes and it will be interesting to have a function that
>>>>>> perform this kind of search automatically).
>>>>>>
>>>>>>
>>>>>> Best regards to all and to whom answer to me.
>>>>>>
>>>>>> --
>>>>>> -----------------------------------------------------
>>>>>> Dr. Alberto Goldoni
>>>>>> Parma, Italy
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>
>>>>
>>>> --
>>>> -----------------------------------------------------
>>>> Dr. Alberto Goldoni
>>>> Parma, Italy
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
-----------------------------------------------------
Dr. Alberto Goldoni
Parma, Italy
More information about the Bioconductor
mailing list