[BioC] Coverage by base
Hervé Pagès
hpages at fhcrc.org
Thu Oct 13 00:52:49 CEST 2011
On 11-10-12 02:08 PM, rohan bareja wrote:
> Hi Herve,
>
>
> Thanks Herve,Steve and Michael..I really appreciate for the help.I have been able to get the summaries as explained by you.
> As you can see below my result>viewSums(vl)
> SimpleIntegerList of length 93
> [["chr1"]] 556640 0 0 0 0 1085 3337 ... 209237 781 72987 154846 52256 168230[["chr10"]] 0 0 0 0 327 0 0 1065 0 0 0 ... 122 0 24495 29181 0 0 0 3337 0 14239
>
>
> >viewSums(vl)[["chr1"]]
> [1] 556640 0 0 0 0 1085 3337 355 0 [10] 497 445 0 17466 1065 34932 1065 27665 5251
> However,while getting the names of the genes from gene_ranges,I am getting the following error.Do you have any idea about this?
>
> > genes<- unlist(gene_ranges,use.names=FALSE)
> genesGRanges with 24774 ranges and 0 elementMetadata values seqnames ranges strand |<Rle> <IRanges> <Rle> | [1] chr19 [ 58858172, 58864865] - | [2] chr8 [ 18248755, 18258723] + | [3] chr20 [ 43248163, 43280376] - | [4] chr18 [ 25530930, 25757445] - | [5] chr1 [243651535, 244006886] - |
>
> >names(genes)<- names(gene_ranges)
> Error in `rownames<-`(`*tmp*`, value = c("1", "10", "100", "1000", "10000", : missing values not allowed in rownames
sessionInfo and reproducible example please? In particular, how did you
generate gene_ranges? As I said previously, you need to make sure that
every element in your GRangesList object 'gene_ranges' contains only 1
range. Otherwise you cannot assume 1-to-1 correspondence between its
elements and the elements of the GRanges object you get after
unlisting:
> grl <- GRangesList(GRanges("chr1", IRanges(3, 9)),
GRanges("chr2", IRanges(2:1, 5)))
> names(grl) <- c("geneA", "geneB")
> gr <- unlist(grl)
> names(gr) <- names(grl)
Error in `rownames<-`(`*tmp*`, value = c("geneA", "geneB", NA)) :
missing values not allowed in rownames
Yeah the error message is not helping a lot :-/
For example, if you use makeTranscriptDbFromUCSC() to get those genes,
it's very likely that you will face this problem:
> txdb <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene")
> txbygene <- transcriptsBy(txdb, by="gene")
> gene_ranges <- range(txbygene)
> table(elementLengths(gene_ranges))
1 2 3 4 5 6 7 8
22721 311 15 7 18 40 73 66
> gene_ranges[which(elementLengths(gene_ranges) == 8L)[1L]]
GRangesList of length 1
$100129636
GRanges with 8 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chr6 [29004000, 29044517] + |
[2] chr6_ssto_hap7 [ 344880, 385401] + |
[3] chr6_mcf_hap5 [ 307555, 348067] + |
[4] chr6_cox_hap2 [ 522820, 563332] + |
[5] chr6_mann_hap4 [ 307401, 347917] + |
[6] chr6_apd_hap1 [ 307442, 315573] + |
[7] chr6_qbl_hap6 [ 307397, 347916] + |
[8] chr6_dbb_hap3 [ 307411, 347926] + |
seqlengths
chr1 chr2 ... chr18_gl000207_random
249250621 243199373 ... 4262
This just means that Entrez Gene ID 100129636 has transcripts on
reference sequences that are not the main chromosomes. So you need
to take extra steps to address this. One approach is to unlist
anyway and propagate the names of the genes by repeating them:
> genes <- unlist(gene_ranges, use.names=FALSE)
> names(genes) <- rep.int(names(gene_ranges),
elementLengths(gene_ranges))
(Note that the above will only work with the devel version of
GenomicFeatures, which requires R 2.14, because the current release
version of the package doesn't allow duplicate names in a GRanges
or GRangesList object).
Then subset 'genes' to get rid of the ranges that are not on the
main chromosomes:
> main_chroms <- paste("chr", c(1:22, "X", "Y", "M"), sep="")
> genes0 <- genes[seqnames(genes) %in% main_chroms]
Then check that the gene names are now unique (with
any(duplicated(names(genes0)))). If they are not, then again you'll
need to choose a reasonable strategy to get rid of the duplicates.
Note that it's not a strict requirement to have 1 range per gene,
it'll just be more convenient to summarize things at the gene
level e.g. when you do viewSums() on your SimpleRleViewsList object.
Cheers,
H.
>
>
>
> Thanks,Rohan
>
>
>
>
> [[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list