[BioC] problem with rat database

Marc Carlson mcarlson at fhcrc.org
Tue May 10 20:38:41 CEST 2011


Hi Alberto,

So the way that the annotation packages work is that they take a probe 
to gene mapping from a manufacturer and then return to you the relevant 
gene information that is associated with the gene that is mapped to by a 
specific probe.

If there was no mapping between the probe and the gene provided by the 
manufacturer, we have not attempted to guess one for you.

But if you feel that you have better information about where these 
probes map to (perhaps you took the time to align them to the genome and 
see what genes are nearby as you did  for this one here), then you could 
supply that mapping to the SQLForge code in the AnnotationDbi package 
and produce a new annotation package based on that.  The details on how 
to do this are described here in one of the vignettes from the 
AnnotationDbi package

http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/inst/doc/SQLForge.pdf

I hope this clarifies things,


   Marc



On 05/10/2011 05:34 AM, Alberto Goldoni wrote:
> @Davis
>
> You are right! But i have tryed to perform this kind of search:
>
> library("rgug4130a.db")
> x<- rgug4130aENSEMBL
> mapped_genes<- mappedkeys(x)
> xx<- as.list(x[mapped_genes])
>
> or this approach:
>
> x<- rgug4130aGENENAME
> mapped_probes<- mappedkeys(x)
> xx<- as.list(x[mapped_probes])
>
> but the results are the same in some genes there is:"unknown function".
>
> I would like to know if there is a method in order to perform the
> search using another database or directly to the Rat Genome Database
> or using biomaRt...but i don't know how.
> I have more or less 100 genes with an "unknown function" and it would
> be very useful if there is a script or function in order to perform
> automatically instead of serching genes one by one.
>
>
> Best regards.
>
> 2011/5/10 Sean Davis<sdavis2 at mail.nih.gov>:
>>
>> On Tue, May 10, 2011 at 8:17 AM, Alberto Goldoni
>> <alberto.goldoni1975 at gmail.com>  wrote:
>>> @Vincent
>>>
>>> The chip used is the "rgug4130a" so i have to use the "rgug4130a.db"
>>> database.
>>>
>>> In order to obtain the toptable this is my history:
>>>
>>> library(limma)
>>> library(vsn)
>>> targets<- readTargets("targets.txt")
>>> RG<- read.maimages(targets$FileName, source="agilent")
>>> MA<- normalizeBetweenArrays(RG, method="Aquantile")
>>> contrast.matrix<-
>>>
>>> cbind("(hda+str)-(ref)"=c(1,0),"(ref+str)-(ref)"=c(0,1),"(hda+str)-(ref+str)"=c(1,-1))
>>> rownames(contrast.matrix)<- colnames(design)
>>> fit<- lmFit(MA, design)
>>> fit2<- contrasts.fit(fit, contrast.matrix)
>>> fit2<- eBayes(fit2)
>>> geni500<-topTable(fit2,number=500,adjust="BH")
>>>
>> Hi, Alberto.
>> The data in your topTable result are taken from the feature extraction
>> result file.  In other words, rgug4130a.db is not used in what you show
>> above.  You could add to your annotation using either rgug4130a.db or
>> biomaRt, but you will need to perform these steps yourself.  As to why some
>> of your probes do not appear to have annotation, you would probably need to
>> contact Agilent as they are the source of your current annotation.
>> Hope that helps,
>> Sean
>>
>>>> sessionInfo()
>>> R version 2.12.1 (2010-12-16)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>>> Kingdom.1252
>>> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>> [5] LC_TIME=English_United Kingdom.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] AnnotationDbi_1.12.0 Biobase_2.10.0       limma_3.6.9
>>>
>>> loaded via a namespace (and not attached):
>>> [1] DBI_0.2-5     RSQLite_0.9-4 tools_2.12.1
>>>
>>>
>>>
>>> 2011/5/10 Vincent Carey<stvjc at channing.harvard.edu>:
>>>> 1) you did not provide sessionInfo(), which is critical for helping
>>>> you to diagnose an issue that may pertain to software version --
>>>> revisions to annotation packages can have all sorts of consequences
>>>>
>>>> 2) i am not sure rgug4130.db has anything to do with this.
>>>>
>>>>> get("CB606456", revmap(rgug4130aSYMBOL))
>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>>   value for "CB606456" not found
>>>>
>>>>
>>>> and so on.  look at the featureData component of the object passed to
>>>> lmFit -- the annotation may be in there.  if this does not give
>>>> clarification please give very explicity indication of how the
>>>> topTable was generated, going back to the structure of the object
>>>> passed to lmFit
>>>>
>>>> On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni
>>>> <alberto.goldoni1975 at gmail.com>  wrote:
>>>>> Dear All,
>>>>> i'm analyzing agilent microarrays with the "rgug4130a.db" database and
>>>>> using the function:"topTable(fit2,number=500,adjust="BH")" i have
>>>>> obtained 500 genes like these:
>>>>>
>>>>> Row     Col     ProbeUID        ControlType     ProbeName
>>>>> GeneName        SystematicName  Description     X.hda.str...ref.
>>>>>   X.ref.str...ref.        X.hda.str...ref.str.    AveExpr F       P.Value
>>>>> adj.P.Val
>>>>> 16096   79      38      15309   0       A_43_P10328     CB606456
>>>>>   CB606456        unknown
>>>>> function        3.988290607     -0.951656306    4.939946913
>>>>> 10.29735936     36.77263264     0.000212298     0.641094595
>>>>> 8109    40      109     7609    0       A_42_P552092    203358_Rn
>>>>> 203358_Rn       Rat c-fos
>>>>> mRNA.   5.670956889     4.413365374        1.257591514     13.47699544
>>>>>      33.20342601     0.000292278     0.641094595
>>>>>
>>>>> but as you can see most genes like the first one  - CB606456 -  in the
>>>>> DESCRPTION there is written "unknown function".
>>>>>
>>>>> So i have performed a very simply search.
>>>>> 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations
>>>>> of DnaAlignFeature" it gives to me the Genomic location(strand): chr
>>>>> 7:16261621-16262210
>>>>> 2) Then in the Rat Genome Database
>>>>> (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found
>>>>> that in this position there is one gene:
>>>>>
>>>>> 735058  GENE    Angptl4 angiopoietin-like 4     7       16261623
>>>>>   16267852
>>>>>
>>>>> so the question is why in the "rgug4130a.db" database the R system
>>>>> gives to me "unknown function" when using the genomic location in
>>>>> ensamble and then in rgd it gives to me the Angptl4 gene!
>>>>>
>>>>> and there is a function in order to do to R to perform this kind of
>>>>> search automatically? (this why in my 500 genes there are 100 "unknow
>>>>> function" genes and it will be interesting to have a function that
>>>>> perform this kind of search automatically).
>>>>>
>>>>>
>>>>> Best regards to all and to whom answer to me.
>>>>>
>>>>> --
>>>>> -----------------------------------------------------
>>>>> Dr. Alberto Goldoni
>>>>> Parma, Italy
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------
>>> Dr. Alberto Goldoni
>>> Parma, Italy
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>



More information about the Bioconductor mailing list