[BioC] hs133phsentrezg metadata

Tue Oct 17 20:22:08 CEST 2006

Hi An,

	Our custom CDF annotation package has only gene name for each probeset
because we designed it this way.

	A probeset's probes could have matches on different location or
chromosomes, even some probes have no match on genome at all, but they
belong to this probeset because they all have perfect match on the
gene's sequence.

	So it is difficult to assign a single genome location to the probeset.
But we do have Map/Group files for probe's genome location. It would
show that most probesets' probes have adjacent genome location, but some
don't. Those files are at
http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download_v8.asp If you are using version 8 of custom cdf.

	To get more detail, please google 'custom cdf' or just drop me a
message.

Best,
Manhong Dai
> Message: 5
> Date: Fri, 13 Oct 2006 09:36:20 -0400
> From: "James W. MacDonald" <jmacdon at med.umich.edu>
> Subject: Re: [BioC] hs133phsentrezg metadata
> To: "De Bondt, An-7114 [PRDBE]" <ADBONDT at PRDBE.jnj.com>,
> 	BioConductor_list <bioconductor at stat.math.ethz.ch>
> Message-ID: <452F9654.6000902 at med.umich.edu>
> Content-Type: text/plain;  charset="utf-8";  format=flowed
> 
> Hi An,
> 
> You should not respond just to me. The goal is to keep these 
> conversations on the list so others can benefit as well.
> 
> De Bondt, An-7114 [PRDBE] wrote:
> > Dear Jim,
> > 
> > Indeed, this is the info I was looking for, thanks!  
> > Could you also give me guidance on how I can get this CHRLOC info into a
> > metadata package like e.g. hs133phsentrezg?  I guess I would have to create
> > a .CDF file first but I do not know how this file needs to be set up...
> > Probably a tab delimited file with and as many rows as gene identifiers on
> > the chip and with the following columns: 
> > 	gene identifiers on the chip
> > 	gene name
> > 	chromosome
> > 	chromosome_start of the identifier on the chip
> > 	chromosome_end of the identifier on the chip
> > 
> > Is this right or should I post this on the mailing list?
> 
> Well, trying to reverse-engineer a metaData package is probably more 
> trouble than it is worth. Why exactly do you need this data to be in a 
> package? The rationale for the metaData packages is to supply end users 
> with a single package that has a relatively simple interface to the 
> data, but once you have the data in your working environment, it is 
> there for you to use.
> 
> Anyway, if you really want the data in an annotation package, you can 
> use AnnBuilder to make one yourself. There are a couple of vignettes in 
> that package that show how to do things, and if you have problems, there 
> are plenty of threads on the list that you can search for common answers.
> 
> I guess the only compelling reason I can think one might want a package 
> is if the goal is to use annaffy to output annotated tables with your 
> data. Is this the case? If so, you can do the same sort of thing using 
> biomaRt and htmlpage() in the annotate package. There is a vignette in 
> biomaRt that shows how to do that. I have also written some functions 
> for affycoretools that automate the process, but they currently don't 
> include the chromosomal location, mainly because I don't find that 
> information very useful for say, an HTML table. However, if there is 
> interest, I am willing to add that capability.
> 
> Best,
> 
> Jim
> 
> 
> > 
> > Thanks,
> > An
> > 
> > 
> > -----Original Message-----
> > From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
> > Sent: Thursday, 12 October 2006 17:04
> > To: De Bondt, An-7114 [PRDBE]
> > Cc: 'bioconductor at stat.math.ethz.ch'
> > Subject: Re: [BioC] hs133phsentrezg metadata
> > 
> > 
> > De Bondt, An-7114 [PRDBE] wrote:
> > 
> >>Dear useR,
> >>
> >>The 'hs133phsentrezg' metadata have only 'hs133phsentrezgGENENAME' mapping
> >>info.  The 'hgu133plus2' metadata has also 'hgu133plus2CHRLOC' info
> > 
> > (besides
> > 
> >>lots of other info).  How can I find 'hs133phsentrezgCHRLOC' info?
> > 
> > 
> > I hadn't realized how sparse the information in these annotation 
> > packages really is. I think your best bet is to use biomaRt to get the 
> > annotation you want.
> > 
> > Something like
> > 
> >  > mart <- useMart("ensembl","hsapiens_gene_ensembl")
> > Checking attributes and filters ... ok
> >  > a <- getBM("chromosome_location", "entrezgene", sub("_at", "", 
> > ls(hs133phsentrezgGENENAME)[1:10]), mart=mart, output="list")
> >  > sapply(a[[1]], length)
> >      1    10   100  1000 10000 10001 10002 10003 10004 10005
> >     62   157   233  1457   907   105    92   371    80   123
> >  > a[[1]][[1]]
> >   [1] 63544227 63545175 63546378 63546412 63547557
> >   [6] 63547599 63547672 63548610 63548624 63548943
> > [11] 63549372 63549373 63549374 63549543 63550044
> > [16] 63550148 63556679 63556692 63556702 63556866
> > [21] 63556880 63556894 63556903 63557399 63557422
> > [26] 63557669 63558246 63558375 63559327 63559846
> > [31] 63560064 63560292 63560992 63561327 63561328
> > [36] 63561647 63561650 63550747 63553556 63556291
> > [41] 63556303 63550430 63550488 63550864 63550878
> > [46] 63551634 63552081 63552199 63552253 63552624
> > [51] 63552827 63553507 63554072 63554973 63554974
> > [56] 63554975 63554981 63554982 63554984 63555253
> > [61] 63555261 63555962
> > 
> > Should do the trick.
> > 
> > HTH,
> > 
> > Jim
> > 
> > 
> > 
> >>Thanks in advance,
> >>An De Bondt
> >>
> >>
> >>
> >>	[[alternative HTML version deleted]]
> >>
> >>_______________________________________________
> >>Bioconductor mailing list
> >>Bioconductor at stat.math.ethz.ch
> >>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>Search the archives:
> > 
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > 
> > 
> 
> 
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
> 
> 
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.