[BioC] Affy chip annotation changes

Yuan Hao yuan.x.hao at gmail.com
Tue Nov 2 14:27:29 CET 2010


Hi Jim,

Very helpful! Thanks very much! However, "244194_at" still points to  
"ADAM22" on NetAFFX. As these two probe sets are informative in    
terms of my data set, I prefer keeping "244194_at" to dropping it.  
Would it be a problem even it maps to the 3'end of the gene?

Cheers,
Yuan

On 2 Nov 2010, at 13:08, James W. MacDonald wrote:

>
>
> On 11/1/2010 5:18 PM, Mark Cowley wrote:
>> Hi Yuan, what does 244194_at map to in the annotation csv file from
>> netaffx? There may be some discrepancy between the netaffx search
>> engine&  the netaffx csv files.
>>
>> in these situations, I like to convince myself that affy's doing the
>> right thing. Like Jim said, you could investigate further using
>> various R tools listed below, or, if you haven't used those tools
>> before, then for about 10 minutes work, you can align the 25mer
>> probes for each probeset (which you can get from netaffx) to the
>> genome using BLAT at the UCSC genome browser. This will tell you
>> whether the probes will target ADAM22 or not.
>
> I have a little function that I use to do this sort of thing:
>
> blatGene <- function(affyid, probe, filename){
>    ## affyid == Affy probeset ID
>    ## probe == BioC probe package name
>    ## filename == output file name
>    require(probe, quietly = TRUE, character.only = TRUE)
>    tmp <- data.frame(get(probe))
>    if(length(affyid) > 1){
>        seqnc <- vector()
>        for(i in seq(along = affyid))
>            seqnc <- c(seqnc, tmp[tmp$Probe.Set.Name == affyid[i], 1])
>    }else{
>        seqnc <- tmp[tmp$Probe.Set.Name == affyid,1]
>    }
>    out <- vector()
>    if(length(seqnc) > 25) warning("Blat will only return values for  
> 25 or fewer sequences!",
>                                   call. = FALSE)
>    for(i in seq(along = seqnc)) out <- rbind(out, rbind(paste(">  
> Probe", i, sep=""), seqnc[i]))
>    write.table(out, filename, sep="\t", quote=FALSE,  
> row.names=FALSE, col.names=FALSE)
> }
>
> I used this to look at the two probesets that apparently no longer  
> target ADAM22, and for hg18 both probesets are off the 3' end of the  
> gene.
>
> Best,
>
> Jim
>
>
>>
>> HTH Mark
>>
>> On 02/11/2010, at 1:42 AM, Yuan Hao wrote:
>>
>>> Hi Jim and Martin,
>>>
>>> Thank you very much for your reply! I have a feeling that this
>>> question is too vague to be answered unless down to the very end of
>>> chip design. However, just for you information. I actually checked
>>> these two missing probe sets on NetAFFX from the Affymetrix
>>> website, and one of them, "244194_at", seems still linking to
>>> ADAM22, while another is not any more.
>>>
>>> Regards, Yuan
>>>
>>> I actually checked on NetAffx from the affymetrix website for two
>>> missed probe sets corresponding to ADAM22. "2 On 1 Nov 2010, at
>>> 14:05, James W. MacDonald wrote:
>>>
>>>> Hi Yuan,
>>>>
>>>> The annotation packages we provide are simply a re-packaging of
>>>> data that we get from Affymetrix, so the simple answer is that
>>>> Affy decided that two of those probesets don't really interrogate
>>>> the transcript for that gene.
>>>>
>>>> You could investigate this further by either going to the
>>>> Affymetrix website and downloading the most recent annotation csv
>>>> file, and the csv file that was contemporaneous with the older
>>>> annotation package (2.2.11), and comparing the two. This will
>>>> likely not be that enlightening, as I would bet the annotation
>>>> simply changes for the two probesets that no longer point to
>>>> ADAM22.
>>>>
>>>> You could also investigate further using e.g., the rtracklayer,
>>>> Biostrings, and BSgenome.Hsapiens.UCSC.hg17
>>>> (BSgenome.Hsapiens.UCSC.hg19) packages.
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>> On 11/1/2010 7:14 AM, Yuan Hao wrote:
>>>>> Dear list,
>>>>>
>>>>> I've downloaded two versions of hgu133plus2 array annotation
>>>>> package: v2.2.11(R 2.9.2) and v2.4.5 (R 2.12.0). Some probe
>>>>> sets annotated in the former version are no longer supported in
>>>>> the recent version. For example, "ADAM22" was annotated by
>>>>> eight probe sets ("1555024_at" "206615_s_at" "206616_s_at"
>>>>> "208226_x_at" "208227_x_at" "208237_x_at" "213411_at"
>>>>> "244194_at"), which have been reduced to six in the latest
>>>>> annotation (the last two probe sets have been missing). I would
>>>>> be grateful if someone could provide some insight about it,
>>>>> especially for this case!
>>>>>
>>>>> Regards, Yuan
>>>>>
>>>>> _______________________________________________ Bioconductor
>>>>> mailing list Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>>>> archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>>>
> -- 
>>>> James W. MacDonald, M.S. Biostatistician Douglas Lab University
>>>> of Michigan Department of Human Genetics 5912 Buhl 1241 E.
>>>> Catherine St. Ann Arbor MI 48109-5618 734-615-7826
>>>> **********************************************************
>>>> Electronic Mail is not secure, may not be read every day, and
>>>> should not be used for urgent or sensitive issues
>>>
>>> _______________________________________________ Bioconductor
>>> mailing list Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>> archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should  
> not be used for urgent or sensitive issues



More information about the Bioconductor mailing list