Hi,

I am analyzing a matched pair tumor/normal rna-seq data set.  After aligning with bowtie2 against UCSC hg19 index, I am trying to get the counts for each of the samples using the iGenomes UCSC GTF file.  I came across the following tutorial which shows different ways to do it in Bioconductor:
http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec_12_16_2013/Rrnaseq/Rrnaseq.pdf

Following along slide 28 "Read Counting with countOverlaps", I am able to generate the counts but get the following Warnings.  I just want to be sure that there is nothing to be worried about because I don't understand the meaning of these warnings.  My code is as follows:

> library(GenomicFeatures)
Loading required package: AnnotationDbi
> library(Rsamtools)
> library(rtracklayer)
> txdb <- loadDb("./data/ucsc_igenomes_hg19_gtf.sqlite")
> eByg <- exonsBy(txdb, by="gene")
> head(eByg)
GRangesList of length 6:
$A1BG
GRanges with 8 ranges and 2 metadata columns:
      seqnames               ranges strand |   exon_id   exon_name
         <Rle>            <IRanges>  <Rle> | <integer> <character>
  [1]    chr19 [58858172, 58858395]      - |    163177        <NA>
  [2]    chr19 [58858719, 58859006]      - |    163178        <NA>
  [3]    chr19 [58861736, 58862017]      - |    163179        <NA>
  [4]    chr19 [58862757, 58863053]      - |    163180        <NA>
  [5]    chr19 [58863649, 58863921]      - |    163181        <NA>
  [6]    chr19 [58864294, 58864563]      - |    163182        <NA>
  [7]    chr19 [58864658, 58864693]      - |    163183        <NA>
  [8]    chr19 [58864770, 58864865]      - |    163184        <NA>

$A1BG-AS1
GRanges with 4 ranges and 2 metadata columns:
      seqnames               ranges strand | exon_id exon_name
  [1]    chr19 [58863336, 58864410]      + |  156640      <NA>
  [2]    chr19 [58864745, 58864840]      + |  156641      <NA>
  [3]    chr19 [58865080, 58865223]      + |  156642      <NA>
  [4]    chr19 [58865735, 58866549]      + |  156643      <NA>

...
<4 more elements>
---
seqlengths:
                 chr12                  chr8 ...  chr1_gl000192_random
                    NA                    NA ...                    NA
> samples = c("BRPC13-1118_L1.D710_501.sorted", "BRPC_13-764_L2.sorted", "DU04_PBMC_RNA_L1.sorted", "DU06_PBMC_RNA_L1.sorted")
>
> samples
[1] "BRPC13-1118_L1.D710_501.sorted" "BRPC_13-764_L2.sorted"
[3] "DU04_PBMC_RNA_L1.sorted"        "DU06_PBMC_RNA_L1.sorted"
>
> samplespath <- paste("./", samples, ".bam", sep="")
>
> samplespath
[1] "./BRPC13-1118_L1.D710_501.sorted.bam"
[2] "./BRPC_13-764_L2.sorted.bam"
[3] "./DU04_PBMC_RNA_L1.sorted.bam"
[4] "./DU06_PBMC_RNA_L1.sorted.bam"
>
> names(samplespath) <- samples
>
> samplespath
        BRPC13-1118_L1.D710_501.sorted                  BRPC_13-764_L2.sorted
"./BRPC13-1118_L1.D710_501.sorted.bam"          "./BRPC_13-764_L2.sorted.bam"
               DU04_PBMC_RNA_L1.sorted                DU06_PBMC_RNA_L1.sorted
       "./DU04_PBMC_RNA_L1.sorted.bam"        "./DU06_PBMC_RNA_L1.sorted.bam"
>
> countDF <- data.frame(row.names=names(eByg))
>
> countDF
data frame with 0 columns and 23710 rows
>
> dim(countDF)
[1] 23710     0
>
> for(i in samplespath) {
+ aligns <- readGAlignmentsFromBam(i)
+ counts <- countOverlaps(eByg, aligns, ignore.strand=TRUE)
+ countDF <- cbind(countDF, counts)
+ }
Warning messages:
1: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chr6_cox_hap2, chr6_dbb_hap3, chr6_ssto_hap7, chr6_qbl_hap6, chr6_mann_hap4, chr6_mcf_hap5, chr6_apd_hap1, chrUn_gl000228, chr17_ctg5_hap1, chr19_gl000209_random, chr4_ctg9_hap1, chrUn_gl000222, chrUn_gl000223, chr1_gl000191_random, chr7_gl000195_random, chrUn_gl000213, chrUn_gl000220, chr17_gl000205_random, chrUn_gl000212, chrUn_gl000218, chrUn_gl000219, chrUn_gl000211, chr4_gl000193_random, chr4_gl000194_random, chr1_gl000192_random
  - in 'y': chrM
  Make sure to always combine/compare objects based on the same reference
  genome (use suppressWarnings() to suppress this warning).
2: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chr6_cox_hap2, chr6_dbb_hap3, chr6_ssto_hap7, chr6_qbl_hap6, chr6_mann_hap4, chr6_mcf_hap5, chr6_apd_hap1, chrUn_gl000228, chr17_ctg5_hap1, chr19_gl000209_random, chr4_ctg9_hap1, chrUn_gl000222, chrUn_gl000223, chr1_gl000191_random, chr7_gl000195_random, chrUn_gl000213, chrUn_gl000220, chr17_gl000205_random, chrUn_gl000212, chrUn_gl000218, chrUn_gl000219, chrUn_gl000211, chr4_gl000193_random, chr4_gl000194_random, chr1_gl000192_random
  - in 'y': chrM
  Make sure to always combine/compare objects based on the same reference
  genome (use suppressWarnings() to suppress this warning).
3: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chr6_cox_hap2, chr6_dbb_hap3, chr6_ssto_hap7, chr6_qbl_hap6, chr6_mann_hap4, chr6_mcf_hap5, chr6_apd_hap1, chrUn_gl000228, chr17_ctg5_hap1, chr19_gl000209_random, chr4_ctg9_hap1, chrUn_gl000222, chrUn_gl000223, chr1_gl000191_random, chr7_gl000195_random, chrUn_gl000213, chrUn_gl000220, chr17_gl000205_random, chrUn_gl000212, chrUn_gl000218, chrUn_gl000219, chrUn_gl000211, chr4_gl000193_random, chr4_gl000194_random, chr1_gl000192_random
  - in 'y': chrM
  Make sure to always combine/compare objects based on the same reference
  genome (use suppressWarnings() to suppress this warning).
4: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chr6_cox_hap2, chr6_dbb_hap3, chr6_ssto_hap7, chr6_qbl_hap6, chr6_mann_hap4, chr6_mcf_hap5, chr6_apd_hap1, chrUn_gl000228, chr17_ctg5_hap1, chr19_gl000209_random, chr4_ctg9_hap1, chrUn_gl000222, chrUn_gl000223, chr1_gl000191_random, chr7_gl000195_random, chrUn_gl000213, chrUn_gl000220, chr17_gl000205_random, chrUn_gl000212, chrUn_gl000218, chrUn_gl000219, chrUn_gl000211, chr4_gl000193_random, chr4_gl000194_random, chr1_gl000192_random
  - in 'y': chrM
  Make sure to always combine/compare objects based on the same reference
  genome (use suppressWarnings() to suppress this warning).
>

I also wanted to verify if what I am doing is correct for paired end reads.

Thanks,

- Pankaj


=======================================
sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

	[[alternative HTML version deleted]]

