[BioC] Error in locateVariants
Valerie Obenchain
vobencha at fhcrc.org
Thu Jul 25 00:34:48 CEST 2013
Thanks for sending the files. Please keep the conversation 'on-list' by
cc'ing bioconductor at r-project.org.
There was a bug in locateVariants() related to 'features' not having
txid or cdsid. That is now fixed in devel (1.7.38) and release (1.6.7).
Thanks for reporting this.
A couple things to note:
(1) The second argument to readVcf() is 'genome'. Here you are setting
the genome to the string 'Pinfestans.fa' string which is probably not
what you intended to do.
vcf1<-readVcf("CGTACG_filteredSNPS.vcf","Pinfestans.fa")
(2) When 'features' is a GRangesList it must be an appropriate
extraction to match the 'region' argument (explained on the man page).
You are interested in CodingVariants so the GRangesList must be
equivalent to a cdsBy(txdb, "transcript") extraction. The
'myGRangesList' object is of length 1 with 801 different seqlevels in
the single list element.
myGRangesList<-GRangesList(myGRanges)
> length(myGRangesList)
[1] 1
> length(seqlevels(myGRangesList))
[1] 801
The idea is that cds regions from the same transcript (i.e., same
seqlevel) are grouped in list elements. Assuming that the seqlevels in
this file are transcript names, you could subset by 'type' and split on
seqname to get cds by transcripts.
> cds <- myGRanges[myGRanges$type == "CDS"]
> cdsby <- split(cds, seqnames(cds))
> length(cdsby)
[1] 801
An alternative approach to is to make a TranscriptDb from your gff file
with makeTranscriptDbFromGFF() in GenomicFeatures. Either the txdb or
cdsby extraction can be used as 'features'.
txdb <- makeTranscriptDbFromGFF("annotation.gff3")
cdsby <- cdsBy(txdb, "transcript")
Oops. Looks like this gff file has something strange going on. I'll send
the report to Marc.
> txdb <-
makeTranscriptDbFromGFF("PI_T30-4_FINAL_CALLGENES_3.annotation.gff3")
Error in .parse_attrCol(attrCol, file, colnames) :
Some attributes do not conform to 'tag=value' format
At any rate, makeTranscriptDbFromGFF() can be a useful function to
convert gff to txdb for well behaved files.
Valerie
On 07/24/2013 01:34 PM, Valerie Obenchain wrote:
> Hi Jolly,
>
> I need a reproducible example in order to help. Are 'rd' and
> 'myGRangesList' too large to send? Also provide the output of
> sessionInfo().
>
> Valerie
>
>
> On 07/24/2013 11:38 AM, Jolly Shrivastava wrote:
>> Hello Valerie,
>>
>> I am running VariantAnnotation package to locate the position of SNPs
>> using
>> locateVariants.
>>
>> The commands that I gave to extract vcf file and the gff file are as
>> follows:
>>
>> ###Reading the vcf file
>> vcf1<-readVcf("CGTACG_filteredSNPS.vcf","Pinfestans.fa")
>> rd<-rowData(vcf1)
>>
>> ####imported and converted the gff3 file to GRangesList object
>> subject<-import.gff("annotation.gff3")
>> myGRanges<-as(subject,"GRanges")
>> myGRangesList<-GRangesList(myGRanges)
>>
>>
>> ### made sure that both of them have same sequence levels
>> rd<-keepSeqlevels(rd,intersect(seqlevels(rd),seqlevels(myGRangesList))
>> myGRangesList<-keepSeqlevels(myGRangesList,intersect(seqlevels(rd),seqlevels(myGRangesList))
>>
>>
>> ###Ran the locateVariants command
>> variants_found<-locateVariants(rd,myGRangesList,CodingVariants())
>>
>> ###Got the following error
>> Error in DataFrame(...) : different row counts implied by arguments
>>
>> Can you please point to what I am doing wrong?
>>
>> Regards
>> Jolly
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list