[Bioc-sig-seq] 'coverage' error message (2)

Thu Jan 7 21:59:47 CET 2010

Dear bioc-sig-sequencing,

I am trying to analyze Eland aligned files (Arabidopsis) for differential expression, using as a guide the 'A ChIP-Seq Data Analysis' handout from a 11/19/09 session at the 'High throughput sequence analysis tools and approaches with Bioconductor' workshop in Seattle.

I generated the error message in the following output.  Can you comment?

Notes:

i. This is my second email on this problem.  My first email omitted some info which appears to me may have affected the response by persons monitoring the mailing list.

ii. One misunderstanding would appear to be that I used the mouse 'reads' data supplied during the lab.  Instead, I used what I'm told is Eland-aligned Arabidopsis data (s_8_export_chr1.txt file derived from s_8_export.txt using grep).

iii. The error message suggests a mismatch between:
> names(arab.chromlens)
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chrC" "chrM"
and
> levels(chromosome(alns_8))
[1] "chr1.fas"

I don't know what to do about this 'mismatch'?  Perhaps I need to arrange so:
> names(arab.chromlens)
gives output of only:
"chr1"?

iv. I note using 'available.genomes()', there are two BSgenome data packages for Arabidopsis.  Could my arbitrary choice be a problem?  Would one have to coordinate the choice with the Arabidopsis genome Eland must have used during alignment?

...

> cerudataDir <- "/home/mterry/data09/wang_892spr09/rob_hw6/ceru_data"
> cerudataDir
[1] "/home/mterry/data09/wang_892spr09/rob_hw6/ceru_data"

> pattern <- "s_8_export_chr1.txt"
> list.files(cerudataDir, pattern)
[1] "s_8_export_chr1.txt"
> filt1 <- alignDataFilter(expression(filtering=="Y"))
> filt2 <- chromosomeFilter("chr[0-9XYM]+.fa")
> filt <- compose(filt1, filt2)
> alns_8 <- readAligned(cerudataDir, pattern, type="SolexaExport",
+ filter=filt)
> alns_8
class: AlignedRead
length: 1022848 reads; width: 35 cycles
chromosome: chr1.fas chr1.fas ... chr1.fas chr1.fas
position: 7568294 167488 ... 4687256 5376960
strand: + + ... + +
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig

> levels(alns_8 at chromosome) <- sub(".fa$", "", levels(chromosome(alns_8)))

> head(levels(alns_8 at chromosome))
[1] "chr1.fas"
> levels(chromosome(alns_8))
[1] "chr1.fas"
> library(BSgenome.Athaliana.TAIR.04232008)
> arab.chromlens <- seqlengths(Athaliana)
> head(arab.chromlens)
    chr1     chr2     chr3     chr4     chr5     chrC
30432563 19705359 23470805 18585042 26992728   154478
> names(arab.chromlens)
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chrC" "chrM"
> cov.arab8 <- coverage(alns_8, width = arab.chromlens, extend = 126L)
Error: UserArgumentMismatch
  'names(width)' (or 'names(end)') mismatch with 'levels(chromosome(x))'
  see ?"AlignedRead-class"

> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BSgenome.Athaliana.TAIR.04232008_1.3.16
[2] chipseq_0.2.1
[3] ShortRead_1.4.0
[4] lattice_0.17-26
[5] BSgenome_1.14.2
[6] Biostrings_2.14.10
[7] IRanges_1.4.9

loaded via a namespace (and not attached):
[1] Biobase_2.6.1 grid_2.10.1   hwriter_1.1
>

Thanks,
P. Terry
pterry at huskers.unl.edu