[Bioc-devel] seqnames missing in headerTabix()

Valerie Obenchain vobencha at fhcrc.org
Tue Aug 23 18:14:50 CEST 2011


Hi Anita,

It looks like the download may not have worked. Check your gtfFn file to 
see if the data are really there,

     less Drosophila_melanogaster.BDGP5.25.62.gtf.gz

Once you are sure of the download you may want to check the file for the 
usual things -
(1) no comments lines starting with #
(2) the file is tab separated, not space separated

Coming from ensembl these should not be a problem.

Valerie


On 08/23/2011 07:02 AM, Anita Lerch wrote:
> Hi,
>
> I tried to stream a 'gtf' file from the ensemble with the Tabix methods.
> The creation of the index files seems to work, but when I checked it
> with headerTabix(tbx)$seqnames and got character(0).
> Of course the scanTabix() didn't worked then too.
> I do not have this problem with the example file in the Rsamtools
> package.
> Does anybody has an explanation for this?
>
> Thanks in advance,
> Anita
>
>> library(Rsamtools)
>> url<- "ftp://ftp.ensembl.org/pub/release-62/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.25.62.gtf.gz"
>> gtfFn<- "Drosophila_melanogaster.BDGP5.25.62.gtf.gz"
>> download.file(url, gtfFn, "wget")
>> indexTabix(gtfFn, format="gff")
> [1] "Drosophila_melanogaster.BDGP5.25.62.gtf.gz.tbi"
>> tbx<- open(TabixFile(gtfFn))
>> headerTabix(tbx)
> $seqnames
> character(0)
>
> $indexColumns
>    seq start   end
>      1     4     5
>
> $skip
> [1] 0
>
> $comment
> [1] "#"
>
> $header
> character(0)
>
>> seqnamesTabix(tbx)
> character(0)
>> cat(yieldTabix(tbx, yieldSize=1L))
>> param<- GRanges(c("3L", "3R"), IRanges(c(1, 1), width=100000))
>> scanTabix(tbx, param=param)
> Error: scanTabix: '3L' not present in tabix index
>    path: /home_fmi/01/lerchani/workspace/Drosophila_melanogaster.BDGP5.25.62.gtf.gz
>
>> sessionInfo()
> R Under development (unstable) (2011-08-23 r56776)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Rsamtools_1.5.51     Biostrings_2.21.9    GenomicRanges_1.5.28 IRanges_1.11.24
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.21.3     RCurl_1.6-9         rtracklayer_1.13.11 tools_2.14.0        XML_3.4-2           zlibbioc_0.1.7
>



More information about the Bioc-devel mailing list