[BioC] Probeset/Transcript cluster definitions for HTA2.0 using pdInfoBuilder
Guilherme Rocha
gvrocha at gmail.com
Fri Aug 29 14:28:47 CEST 2014
Thank you.
Your reply helps a lot in letting me know where to look for things. :)
Best,
G
On Wed, Aug 27, 2014 at 11:08 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Guilherme,
>
>
> On Tue, Aug 26, 2014 at 10:00 AM, Guilherme Rocha <gvrocha at gmail.com>
> wrote:
>
>> Hi all,
>>
>> I have constructed a package information file for Affy's HTA 2.0 chip
>> using pdInfoBuilder as shown below.
>> It appears that the annotation files have been upgraded to na34 (from
>> na33 in probeFile and transFile).
>>
>> Specific question: do the annotation files affect which probes are
>> included in each probeset/trascript cluster?
>>
>
> They can. It depends on changes between the current genome build and the
> one on which the original probeset/transcript clusters were based. Given
> the maturity of the Human Genome, I wouldn't expect massive changes.
>
>
>> Broader question: what information from the annotation files is actually
>> used by pdInfoBuider?
>>
>
> This is something you could explore for yourself. If you go to the svn (
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks), using
> readonly for both the password and user name, and look at the source for
> pdBuilderV2HTA2.R, you can see this near the top, in the function
> parseHtaProbesetCSV():
>
>
> cols <- c("probeset_id", "seqname", "strand", "start", "stop",
> "transcript_cluster_id", "exon_id",
> "crosshyb_type", "level", "probeset_type",
> "junction_start_edge", "junction_stop_edge",
> "junction_sequence", "has_cds")
>
> So all of this information is parsed out of the probeset CSV file. If
> there are changes to the current human genome that would imply that a
> particular probe or probeset no longer measures what Affy originally
> intended (or if the strand, start, or stop position change), then the
> changes would be reflected here, and would then be passed to the pd.hta.2.0
> package that you built.
>
> The transcript CSV file is used for much less. AFAIK, that file is just
> parsed and put into the extdata directory of the package:
>
>
> #######################################################################
> ## Part vi) Save NetAffx Annotation to extdata
>
> #######################################################################
> if (!quiet) message("Saving NetAffx Annotation... ",
> appendLF=FALSE)
> netaffxProbeset <- annot2fdata(object at probeFile)
> save(netaffxProbeset, file=file.path(extdataDir,
> 'netaffxProbeset.rda'), compress='xz')
> netaffxTranscript <- annot2fdata(object at transFile)
> save(netaffxTranscript, file=file.path(extdataDir,
> 'netaffxTranscript.rda'),
> compress='xz')
>
> And you can see what that looks like by doing:
>
> load(paste0(path.package("pd.hta.2.0"), "/extdata/netaffxTranscript.rda"))
>
> and then
>
> head(pData(netaffxTranscript))
>
> but I don't think these data are currently used for anything.
>
> Best,
>
> Jim
>
>
>
>
>>
>> Any help appreciated.
>>
>> Thanks,
>>
>> Guilherme Rocha
>>
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Construction fo the package:
>>
>> library(pdInfoBuilder)
>>
>> setwd("/my_bioc_packages/")
>>
>> seed <- new("AffyHTAPDInfoPkgSeed",
>> version = "3.8.0",
>> license = "Artistic-2.0",
>> pgfFile = ".../HTA-2_0.r1.pgf",
>> clfFile = ".../HTA-2_0.r1.clf",
>> probeFile = ".../HTA-2_0.na33.hg19.probeset.csv",
>> transFile = ".../HTA-2_0.na33.1.hg19.transcript.csv",
>> coreMps = ".../HTA-2_0.r1.Psrs.mps",
>> geneArray = TRUE,
>> author = "gvrocha",
>> email = "gvrocha at gmail.com",
>> biocViews = "AnnotationData",
>> genomebuild = "hg19",
>> organism = "Homo sapiens",
>> species = "Homo sapien",
>> url = "http://about.me/gvrocha")
>>
>> makePdInfoPackage(seed, destDir=".")
>>
>>
>> --
>> Guilherme V. Rocha
>> gvrocha at gmail.com
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
--
Guilherme V. Rocha
gvrocha at gmail.com
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list