[BioC] blast probe clusters when using Affymetrix Gene Array Strips
Joao Sollari Lopes
j.sollari.lopes at gmail.com
Wed Sep 4 16:57:52 CEST 2013
Hi Jim,
Thanks for your help once again!
Joao
On 09/04/2013 03:08 PM, James W. MacDonald wrote:
> Hi Joao,
>
> On Wednesday, September 04, 2013 6:11:12 AM, Joao Sollari Lopes wrote:
>> Hi,
>>
>> I am using Zebrafish Gene 1.1 ST Array Strip, I have found some
>> transcript clusters that are differentially expressed but are not
>> annotated (although they belong to the "main" design of the array). I
>> would like to blast them, but I am not sure what to blast as each
>> transcript cluster has various probes associated. Should I blast them
>> all individually? I have read about "probe set target sequence"
>> (https://stat.ethz.ch/pipermail/bioconductor/2004-March/004250.html),
>> but I am not sure if it applies to the Gene Array Strip. If it does,
>> how can I obtain these sequences?
>
> Depends on what you decide to do. You can download the transcript
> clusters here:
>
> http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene-1_1-st-v1/ZebGene-1_1-st-v1.zv9.transcript_cluster.fa.zip
>
>
> and then get the FASTA sequences you want to blast. This might not be
> exactly what you want, as the transcripts in that file correspond to
> very long sequences that a given probeset is designed to interrogate.
> As an example, probeset 12943944 is intended to interrogate a 2500 nt
> transcript, but uses 19 probes (25-mers) to do so. If you blast the
> transcript, you will see where that 2500 nt transcript is in the
> genome, but you won't know anything about the individual probes.
>
> You could alternatively use the probe tab file, found here:
>
> http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene-1_1-st-v1/ZebGene-1_1-st-v1.zv9.probe.tab.zip
>
>
> and extract the 19 probes for that particular probeset and then use
> Jim Kent's blat program at the UCSC genome browser to align. I have a
> small function I have used in the past to convert these data to FASTA
> format that you can then upload to blat. But this requires the probe
> tab data to be in a probe package.
>
> I will give you the code, but you will have to make your own probe
> package. You will need to use makeProbePackage() in the
> AnnotationForge package. There is a vignette here:
>
> http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/inst/doc/makeProbePackage.pdf
>
>
> as well as a help page, so you shouldn't have any problems with that.
>
> If you decide to go that direction, here is the function you will need
> to make FASTA files:
>
>
> blatGene <- function(affyid, probe, filename){
> ## affyid == Affy probeset ID
> ## probe == BioC probe package name
> ## filename == output file name
> require(probe, quietly = TRUE, character.only = TRUE)
> tmp <- data.frame(get(probe))
> if(length(affyid) > 1){
> seqnc <- vector()
> for(i in seq(along = affyid))
> seqnc <- c(seqnc, tmp[tmp$Probe.Set.Name == affyid[i], 1])
> }else{
> seqnc <- tmp[tmp$Probe.Set.Name == affyid,1]
> }
> out <- vector()
> if(length(seqnc) > 25) warning("Blat will only return values for 25
> or fewer sequences!",
> call. = FALSE)
> for(i in seq(along = seqnc)) out <- rbind(out, rbind(paste(">
> Probe", i, sep=""), seqnc[i]))
> write.table(out, filename, sep="\t", quote=FALSE, row.names=FALSE,
> col.names=FALSE)
> }
>
> Best,
>
> Jim
>
>
>
>>
>> Thanks,
>> Joao
>> Instituto Gulbenkian de Ciencia
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
More information about the Bioconductor
mailing list