[BioC] VariantAnnotation: Specifying 'seqinfo' at import with 'readVcf'

Tue Sep 24 18:36:03 CEST 2013

Hi Valerie,

In this case, I'm not concerned about reading only a part of the VCF file.

When I call 'readVCF', a 'GRanges' object gets created and also the 
corresponding 'seqinfo' slot.  I was trying to find a way to feed the 
'seqinfo' information directly in the construction of the VCF, rather 
than changing it after the VCF object already has been created.  Is 
there a way to do this?

Best wishes
Julian

On 09/24/2013 06:31 PM, Valerie Obenchain wrote:
> Hi Julian,
>
> On 09/24/2013 02:29 AM, Julian Gehring wrote:
>> Hi,
>>
>> Is there a direct way to specifiy the 'seqinfo' of a genome for the
>> import of a VCF file using 'readVcf'?
>
> I think the question is how to read in a subset of chromosomes/positions
> from a vcf file without an accompanying tabix index. You can't.
> readVcf() requires an index when subsets are defined by
> chromosome/position. However you can read in subsets defined by INFO
> and/or GENO fields without an index.
>
> Approaches:
> (1) create index with ?indexTabix and specify 'which' in ScanVcfParam
> (2) use ?filterVcf to write out a new file of records of interest
>
>> I'm aware that one can change it
>> with the 'seqinfo' method afterwards, but for large VCF files this can
>> take a significant amount of time.
>
> What operation is taking along time? Subsetting the VCF object by
> chromosome?
>
> Valerie
>
>>
>> An alternative would be to sneak it in by the 'which' arguments, such as:
>>
>> readVcf(file, genome, ScanVcfParam(which = as(seq_info, "GRanges")))
>>
>> but this requires the file to be indexed beforehand.
>>
>> Best wishes
>> Julian
>>
>