[BioC] Renaming the seqlevels in a transcript database from GenomicFeatures

Sonali Arora sarora at fhcrc.org
Fri Jun 27 21:18:13 CEST 2014


Hi Fong,

A TranscriptDb cannot be directly subset with 'keepSeqlevels' and 
'dropSeqlevels'.
But the GRanges or GRangesLists extracted from the TranscriptDb can be 
subset.
Below I show an example for how to extract GRanges, subset and change the
chromosome name style from "UCSC" to "NCBI" ( ie from "chr1" to "1")

## load packages
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg18.knownGene)

## extract GRanges
txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene
seqlevels(txdb)
txbygene <- transcriptsBy(txdb, "gene")
seqlevels(txbygene)

## subset to get only ranges from chr 1 to 22
auto <- extractSeqlevelsByGroup(species="Homo sapiens", style="UCSC",
group="auto")
txbygene <- keepSeqlevels(txbygene,auto)
seqlevels(txbygene)

##convert GRangesList to GRanges
ucsc_gr <- unlist(txbygene)

## renaming styles.
newStyle <- mapSeqlevels(seqlevels(ucsc_gr),"NCBI")
ncbi_gr <- renameSeqlevels(ucsc_gr, newStyle)

##check that naming was successful
ncbi_gr[seqnames(ncbi_gr)=="1"]
ucsc_gr[seqnames(ucsc_gr)=="chr1"]

You can easily adapt this example for your scenario - to convert from
"1" to "chr1". For more details : Please  look at examples section of:
http://www.bioconductor.org/packages/devel/bioc/vignettes/GenomeInfoDb/inst/doc/GenomeInfoDb.pdf
and the man page for ?keepSeqlevels.

Thanks and Regards,
Sonali.



On 6/26/2014 2:02 PM, Fong Chun Chan wrote:
> Hi,
>
> I am using the GenomicFeatures package to extract exons from a transcript
> database file. I am using the ensembl transcript database which has no chr,
> yet my bam files that I am working have the chr prefix.
>
> I was thinking one could append chr to the seqlevels in the transcript
> database like this:
>
> library('GenomicFeatures')
> txdb <- loadDb(txdbFile)
> newSeqNames <- paste('chr', seqlevels(txdb), sep = '')
> names(newSeqNames) <- seqlevels(txdb)
> txdb <- renameSeqlevels( txdb, newSeqNames )
> seqlevels(txdb)
>
> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10"
> "chr11" "chr12" [13] "chr13" "chr14"
>
> This appears to work.
>
> But when I run:
>
> transcripts(txdb, vals = list( tx_chrom = "chr1") )
>
> GRanges with 0 ranges and 2 metadata columns:
>     seqnames    ranges strand |     tx_id     tx_name
>        <Rle> <IRanges>  <Rle> | <integer> <character>
>    ---
>    seqlengths:
>
> It suggests that the chromosome renaming is not working...because when I
> run:
>
> transcripts(txdb, vals = list( tx_chrom = '1') )
>
> GRanges with 17239 ranges and 2 metadata columns:
>            seqnames                 ranges strand   |     tx_id
> tx_name
>               <Rle>              <IRanges>  <Rle>   | <integer>
> <character>
>        [1]     chr1         [11869, 14409]      +   |    126213
> ENST00000456328
>
> Has anyone had any luck in getting this to work?
>
> Thanks,
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Thanks and Regards,
Sonali



More information about the Bioconductor mailing list