[BioC] Find Affy probes within a particular region
Sean Davis
sdavis2 at mail.nih.gov
Tue Jun 17 19:15:30 CEST 2008
On Tue, Jun 17, 2008 at 12:35 PM, James W. MacDonald
<jmacdon at med.umich.edu> wrote:
> Hi Daniel,
>
> Daniel Brewer wrote:
>>
>> Hi,
>>
>> I was wondering what the best way to find which Affymetrix probes are
>> within a specific genomic region (chromosome, start, stop). I am not
>> sure if Biomart nor the annotation.db can do this as they both go to
>> some common ID first. The annotation.db stuff seems to only have one
>> position information too. The other option is to dwonload the
>> annotation file from Affymetrix and load that in, but I would prefer to
>> avoid that if at all possible.
>
> So it depends on exactly what you are trying to do here. The annotation
> package will give you the chromosome and start location for the probeset.
> You can use biomaRt to get the chromosome, start, stop for the probeset as
> well:
>
>> getBM(c("chromosome_name", "start_position","end_position"),
>> "affy_hg_u133_plus_2", "1007_s_at", mart)
> chromosome_name start_position end_position
> 1 6 30956784 30975912
>
> Of course the question posed is to find things within a specific region, not
> find the region given the probeset ID. If this is what you want, then an
> annotation.db package might be what you want. Something like
>
> library(mouse4302.db)
> tab <- toTable(mouse4302CHRLOC)
> ind <- tab[,3] == 6 & abs(tab[,2]) <= 53043660 &
> abs(tab[,2]) >= 50000000
> tab[ind,]
>
>> head(tab[ind,])
> probe_id start_location Chromosome
> 1481 1416884_at 51420614 6
> 1482 1432628_at 51420614 6
> 1483 1439421_x_at 51420614 6
> 1484 1448504_a_at 51420614 6
> 1485 1454305_at 51420614 6
> 2397 1422483_a_at -50512561 6
>
> If you then want the probeset start, end, you could feed the Affy IDs to
> biomaRt.
>
> But you specifically said probes, so you might actually want probes instead
> of probesets. If you want to get that specific then you could use the
> Biostrings package along with one of the genome packages (warning - HUGE
> download) and a probe package to get the locations of these probes. I'm not
> going to give code for that. Instead see the vignette in the BSgenome
> package, which has lots of examples.
Another alternative is to use the Ensembl core databases directly
rather than biomaRt. This is a bit tedious, but there is a treasure
trove of information embedded therein. To get the probe locations for
all mapped affy probes on the HG_U133A array, you could do:
> library(RMySQL)
> con <- dbConnect(MySQL(),user='anonymous',host='ensembldb.ensembl.org',dbname='homo_sapiens_core_47_36i')
> ### Warning--BIG SQL command coming....
> a=dbGetQuery(con,sprintf("select op.probeset as probeset,op.name as probe_name,s.name as chromosome,of.seq_region_start as start,of.seq_region_end as end,of.seq_region_strand as strand from oligo_feature of inner join oligo_probe op on op.oligo_probe_id=of.oligo_probe_id inner join oligo_array oa on oa.oligo_array_id=op.oligo_array_id inner join seq_region s on s.seq_region_id=of.seq_region_id where oa.name='%s' order by probeset",'HG_U133A'))
> a[1:5,]
probeset probe_name chromosome start end strand
1 1007_s_at 467:181; 6 30975290 30975314 1
2 1007_s_at 365:115; 6 30975523 30975547 1
3 1007_s_at 532:139; 6 30975672 30975696 1
4 1007_s_at 86:557; 6 30975472 30975496 1
5 1007_s_at 308:15; 6 30975838 30975862 1
This takes a minute or so to complete (at least from the US). You can
then use normal R to do filtering on this dataframe. Hope that helps
somewhat.
Sean
>>
>> Has anyone got any ideas.
>>
>> Many thanks
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list