[BioC] pd.hugene.1.0.st.v1

cstrato cstrato at aon.at
Sat Aug 1 01:03:03 CEST 2009


Dear Mark,

I am not sure, but maybe you could use the old annotation package, which 
I believe was built for release 3 of the HuGene array, see:
http://www.bioconductor.org/packages/2.3/data/annotation/html/hugene10st.db.html

Alternatively, you could use package xps, which allows you to compute 
both gene-level summaries and probeset-level summaries.

Best regards
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
V.i.e.n.n.a           A.u.s.t.r.i.a
e.m.a.i.l:        cstrato at aon.at
_._._._._._._._._._._._._._._._._._


Mark Robinson wrote:
> Hi Vince.
>
> Thanks for the reply.
>
> That's good to know.  But, it only allows me to access the indices, 
> not to actually compute gene-level summaries, right?  Any way to do 
> that without building the package from scratch?
>
> Cheers,
> Mark
>
> On 31/07/2009, at 10:10 PM, Vincent Carey wrote:
>
>> On Fri, Jul 31, 2009 at 12:48 AM, Mark 
>> Robinson<mrobinson at wehi.edu.au> wrote:
>>> Hi all.
>>>
>>> I wonder if its makes more sense to have the *transcript* version of 
>>> this
>>> package, instead of the *probeset* version available when you 
>>> install via:
>>>
>>
>> This merits further discussion.  Note that under the current approach
>> you can obtain
>> the transcript cluster indices for summarization using fData on the
>> output of rma
>>
>>> class(tismix)
>> [1] "GeneFeatureSet"
>> attr(,"package")
>> [1] "oligoClasses"
>>> class(tismixRMA)
>> [1] "ExpressionSet"
>> attr(,"package")
>> [1] "Biobase"
>>> fData(tismixRMA)[1:4,]
>>         fsetid  exon_id transcript_cluster_id level crosshyb_type chrom
>> 7896737 7896737 96595542               7896736    NA             3     1
>> 7896739 7896739 96595544               7896738    NA             3     1
>> 7896741 7896741 96595546               7896740    NA             3     1
>> 7896743 7896743 96595548               7896742    NA             3     1
>>
>>                      accessions
>> 7896737
>>                            <NA>
>> 7896739
>>                            <NA>
>> 7896741 
>> BC136848,BC136907,ENST00000318050,ENST00000326183,ENST00000335137,NM_001
>> 004195,NM_001005240,NM_001005484
>> 7896743
>>        BC118988,ENST00000279067
>>
>>> dim(fData(tismixRMA))
>> [1] 253002      7
>>> dim(exprs(tismixRMA))
>> [1] 253002     33
>>
>> annotation packages are available at both the probescript and
>> transcript cluster level, thanks
>> to folks at city of hope (e.g.,
>> http://www.bioconductor.org/packages/release/data/annotation/html/hugene10sttranscriptcluster.db.html) 
>>
>>
>>
>>> source("http://bioconductor.org/biocLite.R")
>>> biocLite("pd.hugene.1.0.st.v1")
>>>
>>> It seems like as a default, more people would want gene-level 
>>> summaries for
>>> these arrays ... especially since ~200k (~80%) of the probesets have 3
>>> probes or less.
>>>
>>> Of course I (and everyone around the world) could build this package 
>>> locally
>>> from scratch using the transcript CSV, but it seems like there would be
>>> enough demand for this to make available direct from BioC.  Just a 
>>> thought.
>>>  Does anyone agree?
>>>
>>> Or, am I missing something that will allow me to do gene-level 
>>> analysis from
>>> this package?
>>>
>>> My session is below.
>>>
>>> Thanks in advance.
>>> Mark
>>>
>>>
>>>
>>> ----------------------
>>> mac1618:Desktop mrobinson$ wc -l HuGene-1_0-st-v1.na29.*.csv
>>>  257449 HuGene-1_0-st-v1.na29.hg18.probeset.csv
>>>   33317 HuGene-1_0-st-v1.na29.hg18.transcript.csv
>>> ----------------------
>>>
>>>
>>> ----------------------
>>>> library(oligo)
>>> Loading required package: oligoClasses
>>> Loading required package: Biobase
>>>
>>> Welcome to Bioconductor
>>>
>>>  Vignettes contain introductory material. To view, type
>>>  'openVignette()'. To cite Bioconductor, see
>>>  'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>
>>> Loading required package: preprocessCore
>>> Welcome to oligo version 1.8.1
>>>> cf <- dir(celPath,"CEL")
>>>> fs <- read.celfiles( file.path(celPath,cf) )
>>> Loading required package: pd.hugene.1.0.st.v1
>>> Loading required package: RSQLite
>>> Loading required package: DBI
>>> Platform design info loaded.
>>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer1.CEL
>>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer2.CEL
>>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal1.CEL
>>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal2.CEL
>>>> rmaOligo <- oligo::rma(fs)
>>> Background correcting
>>> Normalizing
>>> Calculating Expression
>>> dmOligo <- exprs(rmaOligo)
>>> dim(rmaOligo)
>>>> dmOligo <- exprs(rmaOligo)
>>>> dim(rmaOligo)
>>> Features  Samples
>>>  253002        4
>>>> sessionInfo()
>>> R version 2.9.0 (2009-04-17)
>>> i386-apple-darwin8.11.1
>>>
>>> locale:
>>> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] pd.hugene.1.0.st.v1_2.4.1 RSQLite_0.7-1
>>> [3] DBI_0.2-4                 oligo_1.8.1
>>> [5] preprocessCore_1.6.0      oligoClasses_1.6.0
>>> [7] Biobase_2.4.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affxparser_1.15.6 affyio_1.12.0     Biostrings_2.12.1 IRanges_1.2.2
>>> [5] splines_2.9.0
>>> ----------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>> Mark Robinson, PhD (Melb)
>>> Epigenetics Laboratory, Garvan
>>> Bioinformatics Division, WEHI
>>> e: m.robinson at garvan.org.au
>>> e: mrobinson at wehi.edu.au
>>> p: +61 (0)3 9345 2628
>>> f: +61 (0)3 9347 0852
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>>
>> -- 
>> Vincent Carey, PhD
>> Biostatistics, Channing Lab
>> 617 525 2265
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robinson at garvan.org.au
> e: mrobinson at wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list