[BioC] problem with rat database

Alberto Goldoni alberto.goldoni1975 at gmail.com
Tue May 10 23:35:40 CEST 2011


Very clear.

Thanks Marc.


2011/5/10 Marc Carlson <mcarlson at fhcrc.org>:
> Hi Alberto,
>
> So the way that the annotation packages work is that they take a probe to
> gene mapping from a manufacturer and then return to you the relevant gene
> information that is associated with the gene that is mapped to by a specific
> probe.
>
> If there was no mapping between the probe and the gene provided by the
> manufacturer, we have not attempted to guess one for you.
>
> But if you feel that you have better information about where these probes
> map to (perhaps you took the time to align them to the genome and see what
> genes are nearby as you did  for this one here), then you could supply that
> mapping to the SQLForge code in the AnnotationDbi package and produce a new
> annotation package based on that.  The details on how to do this are
> described here in one of the vignettes from the AnnotationDbi package
>
> http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/inst/doc/SQLForge.pdf
>
> I hope this clarifies things,
>
>
>  Marc
>
>
>
> On 05/10/2011 05:34 AM, Alberto Goldoni wrote:
>>
>> @Davis
>>
>> You are right! But i have tryed to perform this kind of search:
>>
>> library("rgug4130a.db")
>> x<- rgug4130aENSEMBL
>> mapped_genes<- mappedkeys(x)
>> xx<- as.list(x[mapped_genes])
>>
>> or this approach:
>>
>> x<- rgug4130aGENENAME
>> mapped_probes<- mappedkeys(x)
>> xx<- as.list(x[mapped_probes])
>>
>> but the results are the same in some genes there is:"unknown function".
>>
>> I would like to know if there is a method in order to perform the
>> search using another database or directly to the Rat Genome Database
>> or using biomaRt...but i don't know how.
>> I have more or less 100 genes with an "unknown function" and it would
>> be very useful if there is a script or function in order to perform
>> automatically instead of serching genes one by one.
>>
>>
>> Best regards.
>>
>> 2011/5/10 Sean Davis<sdavis2 at mail.nih.gov>:
>>>
>>> On Tue, May 10, 2011 at 8:17 AM, Alberto Goldoni
>>> <alberto.goldoni1975 at gmail.com>  wrote:
>>>>
>>>> @Vincent
>>>>
>>>> The chip used is the "rgug4130a" so i have to use the "rgug4130a.db"
>>>> database.
>>>>
>>>> In order to obtain the toptable this is my history:
>>>>
>>>> library(limma)
>>>> library(vsn)
>>>> targets<- readTargets("targets.txt")
>>>> RG<- read.maimages(targets$FileName, source="agilent")
>>>> MA<- normalizeBetweenArrays(RG, method="Aquantile")
>>>> contrast.matrix<-
>>>>
>>>>
>>>> cbind("(hda+str)-(ref)"=c(1,0),"(ref+str)-(ref)"=c(0,1),"(hda+str)-(ref+str)"=c(1,-1))
>>>> rownames(contrast.matrix)<- colnames(design)
>>>> fit<- lmFit(MA, design)
>>>> fit2<- contrasts.fit(fit, contrast.matrix)
>>>> fit2<- eBayes(fit2)
>>>> geni500<-topTable(fit2,number=500,adjust="BH")
>>>>
>>> Hi, Alberto.
>>> The data in your topTable result are taken from the feature extraction
>>> result file.  In other words, rgug4130a.db is not used in what you show
>>> above.  You could add to your annotation using either rgug4130a.db or
>>> biomaRt, but you will need to perform these steps yourself.  As to why
>>> some
>>> of your probes do not appear to have annotation, you would probably need
>>> to
>>> contact Agilent as they are the source of your current annotation.
>>> Hope that helps,
>>> Sean
>>>
>>>>> sessionInfo()
>>>>
>>>> R version 2.12.1 (2010-12-16)
>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>>>> Kingdom.1252
>>>> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=English_United Kingdom.1252
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> other attached packages:
>>>> [1] AnnotationDbi_1.12.0 Biobase_2.10.0       limma_3.6.9
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] DBI_0.2-5     RSQLite_0.9-4 tools_2.12.1
>>>>
>>>>
>>>>
>>>> 2011/5/10 Vincent Carey<stvjc at channing.harvard.edu>:
>>>>>
>>>>> 1) you did not provide sessionInfo(), which is critical for helping
>>>>> you to diagnose an issue that may pertain to software version --
>>>>> revisions to annotation packages can have all sorts of consequences
>>>>>
>>>>> 2) i am not sure rgug4130.db has anything to do with this.
>>>>>
>>>>>> get("CB606456", revmap(rgug4130aSYMBOL))
>>>>>
>>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>>>  value for "CB606456" not found
>>>>>
>>>>>
>>>>> and so on.  look at the featureData component of the object passed to
>>>>> lmFit -- the annotation may be in there.  if this does not give
>>>>> clarification please give very explicity indication of how the
>>>>> topTable was generated, going back to the structure of the object
>>>>> passed to lmFit
>>>>>
>>>>> On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni
>>>>> <alberto.goldoni1975 at gmail.com>  wrote:
>>>>>>
>>>>>> Dear All,
>>>>>> i'm analyzing agilent microarrays with the "rgug4130a.db" database and
>>>>>> using the function:"topTable(fit2,number=500,adjust="BH")" i have
>>>>>> obtained 500 genes like these:
>>>>>>
>>>>>> Row     Col     ProbeUID        ControlType     ProbeName
>>>>>> GeneName        SystematicName  Description     X.hda.str...ref.
>>>>>>  X.ref.str...ref.        X.hda.str...ref.str.    AveExpr F
>>>>>> P.Value
>>>>>> adj.P.Val
>>>>>> 16096   79      38      15309   0       A_43_P10328     CB606456
>>>>>>  CB606456        unknown
>>>>>> function        3.988290607     -0.951656306    4.939946913
>>>>>> 10.29735936     36.77263264     0.000212298     0.641094595
>>>>>> 8109    40      109     7609    0       A_42_P552092    203358_Rn
>>>>>> 203358_Rn       Rat c-fos
>>>>>> mRNA.   5.670956889     4.413365374        1.257591514     13.47699544
>>>>>>     33.20342601     0.000292278     0.641094595
>>>>>>
>>>>>> but as you can see most genes like the first one  - CB606456 -  in the
>>>>>> DESCRPTION there is written "unknown function".
>>>>>>
>>>>>> So i have performed a very simply search.
>>>>>> 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations
>>>>>> of DnaAlignFeature" it gives to me the Genomic location(strand): chr
>>>>>> 7:16261621-16262210
>>>>>> 2) Then in the Rat Genome Database
>>>>>> (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found
>>>>>> that in this position there is one gene:
>>>>>>
>>>>>> 735058  GENE    Angptl4 angiopoietin-like 4     7       16261623
>>>>>>  16267852
>>>>>>
>>>>>> so the question is why in the "rgug4130a.db" database the R system
>>>>>> gives to me "unknown function" when using the genomic location in
>>>>>> ensamble and then in rgd it gives to me the Angptl4 gene!
>>>>>>
>>>>>> and there is a function in order to do to R to perform this kind of
>>>>>> search automatically? (this why in my 500 genes there are 100 "unknow
>>>>>> function" genes and it will be interesting to have a function that
>>>>>> perform this kind of search automatically).
>>>>>>
>>>>>>
>>>>>> Best regards to all and to whom answer to me.
>>>>>>
>>>>>> --
>>>>>> -----------------------------------------------------
>>>>>> Dr. Alberto Goldoni
>>>>>> Parma, Italy
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>
>>>>
>>>> --
>>>> -----------------------------------------------------
>>>> Dr. Alberto Goldoni
>>>> Parma, Italy
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
-----------------------------------------------------
Dr. Alberto Goldoni
Parma, Italy



More information about the Bioconductor mailing list