[BioC] pd.hugene.1.0.st.v1
Mark Robinson
mrobinson at wehi.EDU.AU
Sat Aug 1 00:15:21 CEST 2009
Hi Vince.
Thanks for the reply.
That's good to know. But, it only allows me to access the indices,
not to actually compute gene-level summaries, right? Any way to do
that without building the package from scratch?
Cheers,
Mark
On 31/07/2009, at 10:10 PM, Vincent Carey wrote:
> On Fri, Jul 31, 2009 at 12:48 AM, Mark
> Robinson<mrobinson at wehi.edu.au> wrote:
>> Hi all.
>>
>> I wonder if its makes more sense to have the *transcript* version
>> of this
>> package, instead of the *probeset* version available when you
>> install via:
>>
>
> This merits further discussion. Note that under the current approach
> you can obtain
> the transcript cluster indices for summarization using fData on the
> output of rma
>
>> class(tismix)
> [1] "GeneFeatureSet"
> attr(,"package")
> [1] "oligoClasses"
>> class(tismixRMA)
> [1] "ExpressionSet"
> attr(,"package")
> [1] "Biobase"
>> fData(tismixRMA)[1:4,]
> fsetid exon_id transcript_cluster_id level crosshyb_type
> chrom
> 7896737 7896737 96595542 7896736 NA
> 3 1
> 7896739 7896739 96595544 7896738 NA
> 3 1
> 7896741 7896741 96595546 7896740 NA
> 3 1
> 7896743 7896743 96595548 7896742 NA
> 3 1
>
> accessions
> 7896737
> <NA>
> 7896739
> <NA>
> 7896741
> BC136848
> ,BC136907,ENST00000318050,ENST00000326183,ENST00000335137,NM_001
> 004195,NM_001005240,NM_001005484
> 7896743
> BC118988,ENST00000279067
>
>> dim(fData(tismixRMA))
> [1] 253002 7
>> dim(exprs(tismixRMA))
> [1] 253002 33
>
> annotation packages are available at both the probescript and
> transcript cluster level, thanks
> to folks at city of hope (e.g.,
> http://www.bioconductor.org/packages/release/data/annotation/html/hugene10sttranscriptcluster.db.html)
>
>
>> source("http://bioconductor.org/biocLite.R")
>> biocLite("pd.hugene.1.0.st.v1")
>>
>> It seems like as a default, more people would want gene-level
>> summaries for
>> these arrays ... especially since ~200k (~80%) of the probesets
>> have 3
>> probes or less.
>>
>> Of course I (and everyone around the world) could build this
>> package locally
>> from scratch using the transcript CSV, but it seems like there
>> would be
>> enough demand for this to make available direct from BioC. Just a
>> thought.
>> Does anyone agree?
>>
>> Or, am I missing something that will allow me to do gene-level
>> analysis from
>> this package?
>>
>> My session is below.
>>
>> Thanks in advance.
>> Mark
>>
>>
>>
>> ----------------------
>> mac1618:Desktop mrobinson$ wc -l HuGene-1_0-st-v1.na29.*.csv
>> 257449 HuGene-1_0-st-v1.na29.hg18.probeset.csv
>> 33317 HuGene-1_0-st-v1.na29.hg18.transcript.csv
>> ----------------------
>>
>>
>> ----------------------
>>> library(oligo)
>> Loading required package: oligoClasses
>> Loading required package: Biobase
>>
>> Welcome to Bioconductor
>>
>> Vignettes contain introductory material. To view, type
>> 'openVignette()'. To cite Bioconductor, see
>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>
>> Loading required package: preprocessCore
>> Welcome to oligo version 1.8.1
>>> cf <- dir(celPath,"CEL")
>>> fs <- read.celfiles( file.path(celPath,cf) )
>> Loading required package: pd.hugene.1.0.st.v1
>> Loading required package: RSQLite
>> Loading required package: DBI
>> Platform design info loaded.
>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer1.CEL
>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer2.CEL
>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal1.CEL
>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal2.CEL
>>> rmaOligo <- oligo::rma(fs)
>> Background correcting
>> Normalizing
>> Calculating Expression
>> dmOligo <- exprs(rmaOligo)
>> dim(rmaOligo)
>>> dmOligo <- exprs(rmaOligo)
>>> dim(rmaOligo)
>> Features Samples
>> 253002 4
>>> sessionInfo()
>> R version 2.9.0 (2009-04-17)
>> i386-apple-darwin8.11.1
>>
>> locale:
>> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] pd.hugene.1.0.st.v1_2.4.1 RSQLite_0.7-1
>> [3] DBI_0.2-4 oligo_1.8.1
>> [5] preprocessCore_1.6.0 oligoClasses_1.6.0
>> [7] Biobase_2.4.1
>>
>> loaded via a namespace (and not attached):
>> [1] affxparser_1.15.6 affyio_1.12.0 Biostrings_2.12.1
>> IRanges_1.2.2
>> [5] splines_2.9.0
>> ----------------------
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: m.robinson at garvan.org.au
>> e: mrobinson at wehi.edu.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> Vincent Carey, PhD
> Biostatistics, Channing Lab
> 617 525 2265
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
More information about the Bioconductor
mailing list