[Bioc-sig-seq] 'coverage' error message (2)

Fri Jan 8 18:28:04 CET 2010

I have only spend 5s reading your email.

But it seems you have the incredible common problem of chromosome name mismatch.  For a given organism there is no standard for how the chr are named.  Depending on the source (genome, what source, what annotation etc), I have -- for the same genome -- seen at least
 chr1
 chr01
 chrI  (<- roman 1)
 scchr1
I would not find it strange at all if whoever aligned using ELAND used a different chr. name.  This is pretty hard to make work automatically.  You usually have to modify the objects yourself.

Kasper

On Jan 7, 2010, at 15:59 PM, pterry at huskers.unl.edu wrote:

> Dear bioc-sig-sequencing,
> 
> I am trying to analyze Eland aligned files (Arabidopsis) for differential expression, using as a guide the 'A ChIP-Seq Data Analysis' handout from a 11/19/09 session at the 'High throughput sequence analysis tools and approaches with Bioconductor' workshop in Seattle.
> 
> I generated the error message in the following output.  Can you comment?
> 
> Notes:
> 
> i. This is my second email on this problem.  My first email omitted some info which appears to me may have affected the response by persons monitoring the mailing list.
> 
> ii. One misunderstanding would appear to be that I used the mouse 'reads' data supplied during the lab.  Instead, I used what I'm told is Eland-aligned Arabidopsis data (s_8_export_chr1.txt file derived from s_8_export.txt using grep).
> 
> iii. The error message suggests a mismatch between:
>> names(arab.chromlens)
> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chrC" "chrM"
> and
>> levels(chromosome(alns_8))
> [1] "chr1.fas"
> 
> I don't know what to do about this 'mismatch'?  Perhaps I need to arrange so:
>> names(arab.chromlens)
> gives output of only:
> "chr1"?
> 
> iv. I note using 'available.genomes()', there are two BSgenome data packages for Arabidopsis.  Could my arbitrary choice be a problem?  Would one have to coordinate the choice with the Arabidopsis genome Eland must have used during alignment?
> 
> 
> ...
> 
>> cerudataDir <- "/home/mterry/data09/wang_892spr09/rob_hw6/ceru_data"
>> cerudataDir
> [1] "/home/mterry/data09/wang_892spr09/rob_hw6/ceru_data"
> 
>> pattern <- "s_8_export_chr1.txt"
>> list.files(cerudataDir, pattern)
> [1] "s_8_export_chr1.txt"
>> filt1 <- alignDataFilter(expression(filtering=="Y"))
>> filt2 <- chromosomeFilter("chr[0-9XYM]+.fa")
>> filt <- compose(filt1, filt2)
>> alns_8 <- readAligned(cerudataDir, pattern, type="SolexaExport",
> + filter=filt)
>> alns_8
> class: AlignedRead
> length: 1022848 reads; width: 35 cycles
> chromosome: chr1.fas chr1.fas ... chr1.fas chr1.fas
> position: 7568294 167488 ... 4687256 5376960
> strand: + + ... + +
> alignQuality: NumericQuality
> alignData varLabels: run lane ... filtering contig
> 
>> levels(alns_8 at chromosome) <- sub(".fa$", "", levels(chromosome(alns_8)))
> 
>> head(levels(alns_8 at chromosome))
> [1] "chr1.fas"
>> levels(chromosome(alns_8))
> [1] "chr1.fas"
>> library(BSgenome.Athaliana.TAIR.04232008)
>> arab.chromlens <- seqlengths(Athaliana)
>> head(arab.chromlens)
>    chr1     chr2     chr3     chr4     chr5     chrC
> 30432563 19705359 23470805 18585042 26992728   154478
>> names(arab.chromlens)
> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chrC" "chrM"
>> cov.arab8 <- coverage(alns_8, width = arab.chromlens, extend = 126L)
> Error: UserArgumentMismatch
>  'names(width)' (or 'names(end)') mismatch with 'levels(chromosome(x))'
>  see ?"AlignedRead-class"
> 
>> sessionInfo()
> R version 2.10.1 (2009-12-14)
> x86_64-pc-linux-gnu
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] BSgenome.Athaliana.TAIR.04232008_1.3.16
> [2] chipseq_0.2.1
> [3] ShortRead_1.4.0
> [4] lattice_0.17-26
> [5] BSgenome_1.14.2
> [6] Biostrings_2.14.10
> [7] IRanges_1.4.9
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.1 grid_2.10.1   hwriter_1.1
>> 
> 
> 
> Thanks,
> P. Terry
> pterry at huskers.unl.edu
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing