[BioC] ReadVcf Memory Issues
Martin Morgan
mtmorgan at fhcrc.org
Fri Jul 20 05:05:40 CEST 2012
On 07/19/2012 05:59 PM, Timothy Duff wrote:
> Hi. I am trying to determine, from a filtered set of Ilumina 450k probes,
> which of them occur with a specified frequency in a given population. I was
> referred to the Variant Annotation package by this mailing list. While the
> readVcf function seems to handle small loads nicely, looping over the
> regions of interest seems to cause allocation troubles. R tells me "Realloc
> could not re-allocate memory (0 bytes)" after about 4 iterations. Below is
for this, I think it is a bug in the release version of
VariantAnnotation, and that it is fixed in devel v. 1.3.6 (current devel
version is 1.3.16) and will be fixed in release version 1.2.10, probably
built Saturday morning, 10am Seattle time. The short-term solution is to
switch to using the devel branch
(http://bioconductor.org/developers/useDevel/), but the bug might be
avoided anyway by re-coding as suggested by Vince.
Martin
> the relevent code, and below it the output of sessionInfo(). If anyone
> might sugget some diagnostic measures or an alternate way of doing this I
> would appreciate it. Thanks.
>
> ------
>
> library(VariantAnnotation)
> library(IlluminaHumanMethylation450kprobe)
>
> filename <- "
> ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/EUR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz
> "
>
> load("rsids.Rdata") # a data frame containing probe id, rs id, and
> chromosome
> data(IlluminaHumanMethylation450kprobe)
> colnames(rsids) <- c("Probe_ID", "RS_ID", "CHR")
> m <- merge(IlluminaHumanMethylation450kprobe, rsids, by="Probe_ID")
>
> filename <- "
> ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/EUR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz
> "
>
> for (i in 1:length(m$Probe_ID)) {
> snprange <- readVcf(TabixFile(filename), "hg19",
> param=GRanges(as.character(m$CHR[i]), IRanges(as.integer(m$start[i]),
> as.integer(m$end[i]))))
> freq <- elementMetadata(info(snprange))["EUR_R2"][1,1]
> if (is.na(freq) == FALSE & freq < .99 & freq > .01) {
> m$CpGs[i] <- 1
> }
> else {
> m$CpGs[i] <- 0
> }
> }
>
>
> ----
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] IlluminaHumanMethylation450kprobe_2.0.6
> [2] AnnotationDbi_1.18.1
> [3] Biobase_2.16.0
> [4] BiocInstaller_1.4.7
> [5] VariantAnnotation_1.2.9
> [6] Rsamtools_1.8.5
> [7] Biostrings_2.24.1
> [8] GenomicRanges_1.8.7
> [9] IRanges_1.14.4
> [10] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.2.0 bitops_1.0-4.1 BSgenome_1.24.0
> [4] DBI_0.2-5 GenomicFeatures_1.8.2 grid_2.15.1
> [7] lattice_0.20-6 Matrix_1.0-6 RCurl_1.4-3
> [10] RSQLite_0.9-2 rtracklayer_1.16.3 snpStats_1.6.0
> [13] splines_2.15.1 stats4_2.15.1 survival_2.36-14
> [16] tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0
>
>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list