[BioC] VariantAnnotation: Specifying 'seqinfo' at import with 'readVcf'
Julian Gehring
julian.gehring at embl.de
Tue Sep 24 18:36:03 CEST 2013
Hi Valerie,
In this case, I'm not concerned about reading only a part of the VCF file.
When I call 'readVCF', a 'GRanges' object gets created and also the
corresponding 'seqinfo' slot. I was trying to find a way to feed the
'seqinfo' information directly in the construction of the VCF, rather
than changing it after the VCF object already has been created. Is
there a way to do this?
Best wishes
Julian
On 09/24/2013 06:31 PM, Valerie Obenchain wrote:
> Hi Julian,
>
> On 09/24/2013 02:29 AM, Julian Gehring wrote:
>> Hi,
>>
>> Is there a direct way to specifiy the 'seqinfo' of a genome for the
>> import of a VCF file using 'readVcf'?
>
> I think the question is how to read in a subset of chromosomes/positions
> from a vcf file without an accompanying tabix index. You can't.
> readVcf() requires an index when subsets are defined by
> chromosome/position. However you can read in subsets defined by INFO
> and/or GENO fields without an index.
>
> Approaches:
> (1) create index with ?indexTabix and specify 'which' in ScanVcfParam
> (2) use ?filterVcf to write out a new file of records of interest
>
>> I'm aware that one can change it
>> with the 'seqinfo' method afterwards, but for large VCF files this can
>> take a significant amount of time.
>
> What operation is taking along time? Subsetting the VCF object by
> chromosome?
>
> Valerie
>
>>
>> An alternative would be to sneak it in by the 'which' arguments, such as:
>>
>> readVcf(file, genome, ScanVcfParam(which = as(seq_info, "GRanges")))
>>
>> but this requires the file to be indexed beforehand.
>>
>> Best wishes
>> Julian
>>
>
More information about the Bioconductor
mailing list