[BioC] blast probe clusters when using Affymetrix Gene Array Strips

Wed Sep 4 16:57:52 CEST 2013

Hi Jim,

Thanks for your help once again!
Joao

On 09/04/2013 03:08 PM, James W. MacDonald wrote:
> Hi Joao,
>
> On Wednesday, September 04, 2013 6:11:12 AM, Joao Sollari Lopes wrote:
>> Hi,
>>
>> I am using Zebrafish Gene 1.1 ST Array Strip, I have found some
>> transcript clusters that are differentially expressed but are not
>> annotated (although they belong to the "main" design of the array). I
>> would like to blast them, but I am not sure what to blast as each
>> transcript cluster has various probes associated. Should I blast them
>> all individually? I have read about "probe set target sequence"
>> (https://stat.ethz.ch/pipermail/bioconductor/2004-March/004250.html),
>> but I am not sure if it applies to the Gene Array Strip. If it does,
>> how can I obtain these sequences?
>
> Depends on what you decide to do. You can download the transcript 
> clusters here:
>
> http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene-1_1-st-v1/ZebGene-1_1-st-v1.zv9.transcript_cluster.fa.zip 
>
>
> and then get the FASTA sequences you want to blast. This might not be 
> exactly what you want, as the transcripts in that file correspond to 
> very long sequences that a given probeset is designed to interrogate. 
> As an example, probeset 12943944 is intended to interrogate a 2500 nt 
> transcript, but uses 19 probes (25-mers) to do so. If you blast the 
> transcript, you will see where that 2500 nt transcript is in the 
> genome, but you won't know anything about the individual probes.
>
> You could alternatively use the probe tab file, found here:
>
> http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene-1_1-st-v1/ZebGene-1_1-st-v1.zv9.probe.tab.zip 
>
>
> and extract the 19 probes for that particular probeset and then use 
> Jim Kent's blat program at the UCSC genome browser to align. I have a 
> small function I have used in the past to convert these data to FASTA 
> format that you can then upload to blat. But this requires the probe 
> tab data to be in a probe package.
>
> I will give you the code, but you will have to make your own probe 
> package. You will need to use makeProbePackage() in the 
> AnnotationForge package. There is a vignette here:
>
> http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/inst/doc/makeProbePackage.pdf 
>
>
> as well as a help page, so you shouldn't have any problems with that.
>
> If you decide to go that direction, here is the function you will need 
> to make FASTA files:
>
>
> blatGene <- function(affyid, probe, filename){
>    ## affyid == Affy probeset ID
>    ## probe == BioC probe package name
>    ## filename == output file name
>    require(probe, quietly = TRUE, character.only = TRUE)
>    tmp <- data.frame(get(probe))
>    if(length(affyid) > 1){
>        seqnc <- vector()
>        for(i in seq(along = affyid))
>            seqnc <- c(seqnc, tmp[tmp$Probe.Set.Name == affyid[i], 1])
>    }else{
>        seqnc <- tmp[tmp$Probe.Set.Name == affyid,1]
>    }
>    out <- vector()
>    if(length(seqnc) > 25) warning("Blat will only return values for 25 
> or fewer sequences!",
>                                   call. = FALSE)
>    for(i in seq(along = seqnc)) out <- rbind(out, rbind(paste("> 
> Probe", i, sep=""), seqnc[i]))
>    write.table(out, filename, sep="\t", quote=FALSE, row.names=FALSE, 
> col.names=FALSE)
> }
>
> Best,
>
> Jim
>
>
>
>>
>> Thanks,
>> Joao
>> Instituto Gulbenkian de Ciencia
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099