[BioC] hs133phsentrezg metadata

Fri Oct 13 17:48:18 CEST 2006

Dear Jim,

The need for the info to be in a package is to use the buildMACAT() from the
macat library.  This function needs as input a data file, the associated
biological information per sample and the chip data package.

Thanks for your suggestions,
An

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
Sent: Friday, 13 October 2006 15:36
To: De Bondt, An-7114 [PRDBE]; BioConductor_list
Subject: Re: [BioC] hs133phsentrezg metadata

Hi An,

You should not respond just to me. The goal is to keep these 
conversations on the list so others can benefit as well.

De Bondt, An-7114 [PRDBE] wrote:
> Dear Jim,
> 
> Indeed, this is the info I was looking for, thanks!  
> Could you also give me guidance on how I can get this CHRLOC info into a
> metadata package like e.g. hs133phsentrezg?  I guess I would have to
create
> a .CDF file first but I do not know how this file needs to be set up...
> Probably a tab delimited file with and as many rows as gene identifiers on
> the chip and with the following columns: 
> 	gene identifiers on the chip
> 	gene name
> 	chromosome
> 	chromosome_start of the identifier on the chip
> 	chromosome_end of the identifier on the chip
> 
> Is this right or should I post this on the mailing list?

Well, trying to reverse-engineer a metaData package is probably more 
trouble than it is worth. Why exactly do you need this data to be in a 
package? The rationale for the metaData packages is to supply end users 
with a single package that has a relatively simple interface to the 
data, but once you have the data in your working environment, it is 
there for you to use.

Anyway, if you really want the data in an annotation package, you can 
use AnnBuilder to make one yourself. There are a couple of vignettes in 
that package that show how to do things, and if you have problems, there 
are plenty of threads on the list that you can search for common answers.

I guess the only compelling reason I can think one might want a package 
is if the goal is to use annaffy to output annotated tables with your 
data. Is this the case? If so, you can do the same sort of thing using 
biomaRt and htmlpage() in the annotate package. There is a vignette in 
biomaRt that shows how to do that. I have also written some functions 
for affycoretools that automate the process, but they currently don't 
include the chromosomal location, mainly because I don't find that 
information very useful for say, an HTML table. However, if there is 
interest, I am willing to add that capability.

Best,

Jim

> 
> Thanks,
> An
> 
> 
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
> Sent: Thursday, 12 October 2006 17:04
> To: De Bondt, An-7114 [PRDBE]
> Cc: 'bioconductor at stat.math.ethz.ch'
> Subject: Re: [BioC] hs133phsentrezg metadata
> 
> 
> De Bondt, An-7114 [PRDBE] wrote:
> 
>>Dear useR,
>>
>>The 'hs133phsentrezg' metadata have only 'hs133phsentrezgGENENAME' mapping
>>info.  The 'hgu133plus2' metadata has also 'hgu133plus2CHRLOC' info
> 
> (besides
> 
>>lots of other info).  How can I find 'hs133phsentrezgCHRLOC' info?
> 
> 
> I hadn't realized how sparse the information in these annotation 
> packages really is. I think your best bet is to use biomaRt to get the 
> annotation you want.
> 
> Something like
> 
>  > mart <- useMart("ensembl","hsapiens_gene_ensembl")
> Checking attributes and filters ... ok
>  > a <- getBM("chromosome_location", "entrezgene", sub("_at", "", 
> ls(hs133phsentrezgGENENAME)[1:10]), mart=mart, output="list")
>  > sapply(a[[1]], length)
>      1    10   100  1000 10000 10001 10002 10003 10004 10005
>     62   157   233  1457   907   105    92   371    80   123
>  > a[[1]][[1]]
>   [1] 63544227 63545175 63546378 63546412 63547557
>   [6] 63547599 63547672 63548610 63548624 63548943
> [11] 63549372 63549373 63549374 63549543 63550044
> [16] 63550148 63556679 63556692 63556702 63556866
> [21] 63556880 63556894 63556903 63557399 63557422
> [26] 63557669 63558246 63558375 63559327 63559846
> [31] 63560064 63560292 63560992 63561327 63561328
> [36] 63561647 63561650 63550747 63553556 63556291
> [41] 63556303 63550430 63550488 63550864 63550878
> [46] 63551634 63552081 63552199 63552253 63552624
> [51] 63552827 63553507 63554072 63554973 63554974
> [56] 63554975 63554981 63554982 63554984 63555253
> [61] 63555261 63555962
> 
> Should do the trick.
> 
> HTH,
> 
> Jim
> 
> 
> 
>>Thanks in advance,
>>An De Bondt
>>
>>
>>
>>	[[alternative HTML version deleted]]
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
> 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be
used for urgent or sensitive issues.