[BioC] Inconsistent annotation of affy probeset on Affymetrix chip for rat: 230.2
Robert Gentleman
rgentlem at fhcrc.org
Thu Jul 3 06:32:07 CEST 2008
Hi,
It is actually a bit simpler than Mark has suggested.
biocLite("rat2302probe")
will get the probe sequences used (at least as reported in late March
- but they should not change)
then you could ask Herve to build a BSgenome package for Rat, and use
Biostrings to do the matching...
or save the probes and use BLAT or any other string matcher (MAQ)
best wishes
Robert
Mark Cowley wrote:
> Hi Christoph,
> I would recommend obtaining the sequences of the actual probes that make
> up this probeset (from NetAffx), then align them to the latest genome
> using BLAT, thereby you can convince yourself which mRNA that these
> probes will be most likely to detect.
> I find that aligning the probes often tells you far more information
> than the affymetrix consensus sequence ever wi
> Be very concerned if your probes start aligning all over the genome!ll.
>
> cheers,
> Mark
>
> On 03/07/2008, at 3:47 AM, Marc Carlson wrote:
>
>> Christoph Preuss wrote:
>>> Hi everyone,
>>>
>>> We analyzed a global exression microarray data set using gcrma for the
>>> normalization step and limma for finding differentially expressed
>>> genes. One of the most significant probesets (ProbeSetID annotation
>>> "1375535_at") in terms of d.e is annotated as :
>>> Probeset "1375535_at"
>>> -Gene Symbol: Lpin1
>>> - Location: Chr 6
>>>
>>> in the bioconductor package "rat2302" / "rat2302.db".
>>>
>>> We also looked at the Affymetrix web site, where the same probeset was
>>> annoted as "Transcribed sequence" on chromosome X.
>>>
>>> Affymetrix Annotation RG 230 2.0 Chip:
>>> -ProbeSetID: 1375535_at
>>> -Target Sequence:
>>>
>>>> RAT230_2:1375535_AT
>>>>
>>> gaagttagagagctgtttccccactttacattttaaaatatgtatgccaggatntaatca
>>> ttcctttaagtgtacacttcaaggagagatgtgccgaataagaaaatagctttctctagc
>>> gtgaagggttttgcgtccgccgagttcttaaggtcttttttaagagctactgtgtatgag
>>> tgtgtgtatgtgtgcgcatgcatgttcctgcgactagtcattcattcacatggtgatcag
>>> acaacaatgggagctggttcgtctaccttatcttgtgggtcctggagttcaatctcagat
>>> catcaggctgggcagcaagtgccttcaccctccgagccatcttgccatcccacagctgag
>>> cgtctaatatgacattgccgatga
>>>
>>> Interestingly, the given target sequence for the probeset matches only
>>> a mouse sequence and not even a rat mRNA (blastn search).
>>>
>>> The question is which annotation should we trust?
>>> Is there any chance to validate the probeset annotation?
>>> Many thanks in advance for any help.
>>>
>>> cheers,
>>>
>>> Christoph Preuss
>>>
>>> (Leibniz-Institute for Arteriosclerosis Research, University of
>>> Muenster Germany )
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>> Hi Christoph,
>>
>> I can only really speak for the Bioconductor annotations which are
>> generated from public sources along with an initial mapping of the
>> probe or probeset to a public accession (usually this is a Genbank,
>> Entrez ID or a related type of ID). In the case of "1375535_at", the
>> probeset is an Affymetrix probeset and so we are ultimately at the
>> mercy of Affymetrix to accurately tell us what this probeset is in
>> this initial mapping, but after this we do the rest ourselves by using
>> public sources. We map the probeset to ID information onto additional
>> information gathered from public sources (primarily NCBI) to get the
>> rest of the information in the package. The file that you get from
>> Affymetrix may also have a lot of the same data as our packages, but
>> unless they describe it somewhere, I don't think we actually know for
>> certain where they collected all of their information from. The only
>> information that we ever actually take from them is the initial
>> mapping of their probeset onto a public accession.
>>
>> I dug up the latest Affymetrix mapping files that we used to generate
>> this package and investigated. From the file that I have (which was
>> collected in late March) the probeset you listed is indicated to be
>> Lpin1, and also to be located on Chromosome 6 which agrees completely
>> with the information that we gathered from NCBI and GoldenPath from
>> this time. As of this morning, NCBI still lists this gene as being
>> Lipin1 and being located on Chromosome 6. However, there is also a
>> field right next to that in the Affymetrix file that is called
>> "Alignments" which lists the X chromosome. But when I pull up an even
>> more recent file from Affymetrix, then I see that they no longer list
>> the location of this gene and have now replaced that value with a
>> "---", they also no longer list the genes name or symbol. But they
>> still list Chromosome "X" in the alignment field and have even
>> assigned different accessions to this probeset.
>> So the short answer is that Affymetrix has changed their mind about
>> what they are claiming this probeset is measuring.
>>
>>
>> I hope this helps you,
>>
>>
>> Marc
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list