[BioC] Find Affy probes within a particular region
Martin Morgan
mtmorgan at fhcrc.org
Tue Jun 17 18:19:54 CEST 2008
Daniel Brewer <daniel.brewer at icr.ac.uk> writes:
> Hi,
>
> I was wondering what the best way to find which Affymetrix probes are
> within a specific genomic region (chromosome, start, stop). I am not
> sure if Biomart nor the annotation.db can do this as they both go to
> some common ID first. The annotation.db stuff seems to only have one
> position information too. The other option is to dwonload the
> annotation file from Affymetrix and load that in, but I would prefer to
> avoid that if at all possible.
>
> Has anyone got any ideas.
Not sure whether this is a good idea or not, but...
## create a data frame of probe genomic location
makeLookup <- function(pkg) {
filt <- function(x) !is.null(names(x)) # some w/out names, hence czomes
lst <- Filter(filt, as.list(getAnnMap("CHRLOC", pkg)))
data.frame(id=rep(names(lst), sapply(lst, length)),
pos=unlist(lst, use.names=FALSE),
chr=unlist(lapply(lst, names), use.names=FALSE),
row.names=NULL)
}
this gives us
> lookup <- makeLookup("hgu95av2.db")
> head(lookup)
id pos chr
1 1000_at -30032926 16
2 1001_at 43539250 1
3 1002_f_at 96512452 10
4 1003_s_at 118269310 11
5 1003_s_at 118259776 11
6 1004_at 118269310 11
then...
## find probes in a single region
contains <- function(chr, start, end, table) {
apos <- abs(table$pos)
idx <- table$chr == chr & apos >= start & apos <=end
table[idx,]
}
> contains(10, 96000000, 97000000, lookup)
id pos chr
3 1002_f_at 96512452 10
525 1455_f_at 96688429 10
550 1477_s_at 96433367 10
4321 34078_s_at 96512452 10
6798 36320_at 96152175 10
7509 36937_s_at -96987321 10
9367 38548_at -96786519 10
One could use 'contains' with mapply to get multiple regions, but
perhaps there's a more efficient way for such bulk queries.
Not sure about your concerns about just 'location'; the probes are a
common length, so you could incorporate this into the 'idx'
calculation in contains().
Probably someone else will offer up a ready-made solution.
Martin
> Many thanks
>
> --
> **************************************************************
> Daniel Brewer, Ph.D.
>
> Institute of Cancer Research
> Molecular Carcinogenesis
> Email: daniel.brewer at icr.ac.uk
> **************************************************************
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the a...{{dropped:2}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the Bioconductor
mailing list