[BioC] blast probe clusters when using Affymetrix Gene Array Strips
James W. MacDonald
jmacdon at uw.edu
Wed Sep 4 16:08:12 CEST 2013
Hi Joao,
On Wednesday, September 04, 2013 6:11:12 AM, Joao Sollari Lopes wrote:
> Hi,
>
> I am using Zebrafish Gene 1.1 ST Array Strip, I have found some
> transcript clusters that are differentially expressed but are not
> annotated (although they belong to the "main" design of the array). I
> would like to blast them, but I am not sure what to blast as each
> transcript cluster has various probes associated. Should I blast them
> all individually? I have read about "probe set target sequence"
> (https://stat.ethz.ch/pipermail/bioconductor/2004-March/004250.html),
> but I am not sure if it applies to the Gene Array Strip. If it does,
> how can I obtain these sequences?
Depends on what you decide to do. You can download the transcript
clusters here:
http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene-1_1-st-v1/ZebGene-1_1-st-v1.zv9.transcript_cluster.fa.zip
and then get the FASTA sequences you want to blast. This might not be
exactly what you want, as the transcripts in that file correspond to
very long sequences that a given probeset is designed to interrogate.
As an example, probeset 12943944 is intended to interrogate a 2500 nt
transcript, but uses 19 probes (25-mers) to do so. If you blast the
transcript, you will see where that 2500 nt transcript is in the
genome, but you won't know anything about the individual probes.
You could alternatively use the probe tab file, found here:
http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene-1_1-st-v1/ZebGene-1_1-st-v1.zv9.probe.tab.zip
and extract the 19 probes for that particular probeset and then use Jim
Kent's blat program at the UCSC genome browser to align. I have a small
function I have used in the past to convert these data to FASTA format
that you can then upload to blat. But this requires the probe tab data
to be in a probe package.
I will give you the code, but you will have to make your own probe
package. You will need to use makeProbePackage() in the AnnotationForge
package. There is a vignette here:
http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/inst/doc/makeProbePackage.pdf
as well as a help page, so you shouldn't have any problems with that.
If you decide to go that direction, here is the function you will need
to make FASTA files:
blatGene <- function(affyid, probe, filename){
## affyid == Affy probeset ID
## probe == BioC probe package name
## filename == output file name
require(probe, quietly = TRUE, character.only = TRUE)
tmp <- data.frame(get(probe))
if(length(affyid) > 1){
seqnc <- vector()
for(i in seq(along = affyid))
seqnc <- c(seqnc, tmp[tmp$Probe.Set.Name == affyid[i], 1])
}else{
seqnc <- tmp[tmp$Probe.Set.Name == affyid,1]
}
out <- vector()
if(length(seqnc) > 25) warning("Blat will only return values for 25
or fewer sequences!",
call. = FALSE)
for(i in seq(along = seqnc)) out <- rbind(out, rbind(paste(">
Probe", i, sep=""), seqnc[i]))
write.table(out, filename, sep="\t", quote=FALSE, row.names=FALSE,
col.names=FALSE)
}
Best,
Jim
>
> Thanks,
> Joao
> Instituto Gulbenkian de Ciencia
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list