[BioC] extract introns
Yating Cheng
yating.cheng at charite.de
Sun Nov 13 15:42:23 CET 2011
Hi, everyone,
Now I am facing new problem.
The following file I got from biomart. So, GB*** is ensemble_gene_id.and
Group*** is chromosome_name.
The first problem is that, in the file is exon_chrom_start and end, but
should I need intron_start and end ? since I need intron sequences.
"ensembl_gene_id" "chromosome_name" "exon_chrom_start" "exon_chrom_end"
"strand"
"1" "GB17244" "Group5.28" 95590 95776 1
"2" "GB17244" "Group5.28" 95865 96055 1
"3" "GB17244" "Group5.28" 96189 96428 1
"4" "GB17244" "Group5.28" 100525 100632 1
"5" "GB17244" "Group5.28" 100689 100751 1
"6" "GB17244" "Group5.28" 100835 101145 1
"7" "GB17244" "Group5.28" 101265 101547 1
"8" "GB17244" "Group5.28" 101632 101673 1
"9" "GB12201" "Group13.1" 374539 374712 -1
"10" "GB12201" "Group13.1" 366611 366778 -1
And another prolem is in the script.
library(Biostrings)
library(GenomicRanges)
source("http://ftp:/biocLite.R")
biocLite("BSgenome.Amellifera.BeeBase.assembly4")
library(BSgenome.Amellifera.BeeBase.assembly4)
library(BSgenome.Amellifera.BeeBase.assembly4)
#Loading required package: BSgenome
#Warning message:
#package 'BSgenome' was built under R version 2.13.2
intron_ranges<-read.table("intron_ranges.txt")
gr.exons <- with(intron_ranges, {
GRanges(chromosome_name,
IRanges(exon_chrom_start, exon_chrom_end),
ifelse(strand == 1, '+', '-'))})
intron.seqs <-getSeq(Amellifera, gr.exons, as.character=FALSE)
#Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA,
width, : sequence Group5.28 not found
My guess is that the chromosome names I input is not the same stored in
the BS genome. But that is the only way I can get the IRanges. Or does
anyone know how to get IRanges direct from BS genome.
Thank you very much.
Yating
Molecular Medicine Master's ProgramCharité
Universitätsmedizin Berlin
> Also, just to make sure the i's are dotted, and t's are crossed.
>
> This code I gave:
> R> gr.exons <- with(xcripts, {
> GRanges(chromosome_name,
> IRanges(exon_chrom_start, exon_chrom_end),
> ifelse(strand == 1, '+', '-'))
> })
>
> Assumes you've already loaded the GenomicRanges library via a call like
> so:
>
> R> library(GenomicRanges)
>
> You mentioned you're still a bit green with bioconductor, so I just
> wanted to make sure you knew where to find "GRanges" ... sorry if
> that's too obvious.
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
More information about the Bioconductor
mailing list