[BioC] VariantAnnotation - extracting AD and PL fields

Stephanie M. Gogarten sdmorris at u.washington.edu
Tue Apr 8 17:52:35 CEST 2014



On 4/8/14 4:51 AM, Michael Lawrence wrote:
> Do you really just want to see the values, or is the goal to compute on
> them? That AD field seems a bit strange, because the first sample is
> missing values. By convention, the AD has values for the ref and alts,
> across all of the samples.  In order to really work with these data, you'll
> need to convert that list matrix to a cube with NAs as padding. I wouldn't
> be able to suggest how without having my hands on the actual matrix.

VariantAnnotation has an internal function .matrixOfListsToArray that 
does exactly this.  Maybe it should be exported?

>
> If you really do just want to see them, you can do
> unstrsplit(CharacterList(AD), ",") in devel, or use
> rtracklayer:::pasteCollapse instead of unstrsplit in release. Then wrap
> that result back into a matrix.
>
>
>
> On Mon, Apr 7, 2014 at 10:53 PM, Lavinia Gordon
> <Lavinia.Gordon at agrf.org.au>wrote:
>
>> Dear All
>>
>> Using VariantAnnotation to parse a vcf file:
>>
>> ADvcf <- geno(vcf)$AD
>>
>> How can I access these values?
>>
>> ADvcf[1:2,1:2]
>>              Sample1   Sample2
>> chrM:72_T/C Integer,0 Integer,2
>> chrM:73_G/A Integer,0 Integer,2
>>
>> as ideally I'd like something like:
>> ADvcf[1:2,1:2]
>>              Sample1   Sample2
>> chrM:72_T/C      6,2          10,18
>> chrM:73_G/A    5,40        0,23
>>
>> Thank you.
>>
>> Lavinia Gordon
>> Bioinformatics Manager
>>
>> Australian Genome Research Facility Ltd
>> The Walter and Eliza Hall Institute
>> 1G Royal Parade
>> Parkville VIC 3050
>> Australia
>>
>> sessionInfo()
>> R version 3.0.3 (2014-03-06)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] VariantAnnotation_1.8.13 Rsamtools_1.14.3         Biostrings_2.30.1
>> [4] GenomicRanges_1.14.4     XVector_0.2.0            IRanges_1.20.7
>> [7] BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.24.0   Biobase_2.22.0         biomaRt_2.18.0
>>   [4] bitops_1.0-6           BSgenome_1.30.0        DBI_0.2-7
>>   [7] GenomicFeatures_1.14.5 RCurl_1.95-4.1         RSQLite_0.11.4
>> [10] rtracklayer_1.22.7     stats4_3.0.3           tools_3.0.3
>> [13] XML_3.98-1.1           zlibbioc_1.8.0
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list