[BioC] extract introns

Yating Cheng yating.cheng at charite.de
Sun Nov 13 15:46:36 CET 2011


Hi, everyone,

Now I am facing new problem.

The following file I got from biomart. So, GB*** is ensemble_gene_id.and
Group*** is chromosome_name.

The first problem is that, in the file is exon_chrom_start and end, but
should I need intron_start and end ? since I need intron sequences.


"ensembl_gene_id" "chromosome_name" "exon_chrom_start" "exon_chrom_end"
"strand"
"1" "GB17244" "Group5.28" 95590 95776 1
"2" "GB17244" "Group5.28" 95865 96055 1
"3" "GB17244" "Group5.28" 96189 96428 1
"4" "GB17244" "Group5.28" 100525 100632 1
"5" "GB17244" "Group5.28" 100689 100751 1
"6" "GB17244" "Group5.28" 100835 101145 1
"7" "GB17244" "Group5.28" 101265 101547 1
"8" "GB17244" "Group5.28" 101632 101673 1
"9" "GB12201" "Group13.1" 374539 374712 -1
"10" "GB12201" "Group13.1" 366611 366778 -1

And another prolem is in the script.

library(Biostrings)

library(GenomicRanges)

 source("http://ftp:/biocLite.R")
    biocLite("BSgenome.Amellifera.BeeBase.assembly4")

library(BSgenome.Amellifera.BeeBase.assembly4)
 library(BSgenome.Amellifera.BeeBase.assembly4)
#Loading required package: BSgenome
#Warning message:
#package 'BSgenome' was built under R version 2.13.2

intron_ranges<-read.table("intron_ranges.txt")

gr.exons <- with(intron_ranges, {
 GRanges(chromosome_name,
   IRanges(exon_chrom_start, exon_chrom_end),
   ifelse(strand == 1, '+', '-'))})

intron.seqs <-getSeq(Amellifera, gr.exons, as.character=FALSE)

#Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA,
width,  :   sequence Group5.28 not found

My guess is that the chromosome names I input is not the same stored in
the BS genome. But that is the only way I can get the IRanges. Or does
anyone know how to get IRanges direct from BS genome.

Thank you very much.

Yating
Molecular Medicine Master's ProgramCharité
Universitätsmedizin Berlin

> Also, just to make sure the i's are dotted, and t's are crossed.
>
> This code I gave:
> R> gr.exons <- with(xcripts, {
>  GRanges(chromosome_name,
>    IRanges(exon_chrom_start, exon_chrom_end),
>    ifelse(strand == 1, '+', '-'))
> })
>


> Hi Yating,
>
> Brief note: when replying to emails form the bioconductor list, make
> sure to use "reply all" so the list is cc'd, this way others can
> benefit from the help but also they can jump in to help in better
> ways.
>
> Ok, so:
>
> On Thu, Nov 10, 2011 at 5:16 PM, Yating Cheng <yating.cheng at charite.de>
> wrote:
>> Hi Steve
>>
>> Thanks very much for your answer. I am really a new hand in bioinfo.
>>
>> Actually now I have problem to create the GRanges object. The genes are
>> from Apis mellifera.The data I have is like:
>> "ensembl_gene_id" "chromosome_name" "exon_chrom_start" "exon_chrom_end"
>> "strand"
>> "1" "GB17244" "Group5.28" 95590 95776 1
>> "2" "GB17244" "Group5.28" 95865 96055 1
>> "3" "GB17244" "Group5.28" 96189 96428 1
>> "4" "GB17244" "Group5.28" 100525 100632 1
>> "5" "GB17244" "Group5.28" 100689 100751 1
>> "6" "GB17244" "Group5.28" 100835 101145 1
>> "7" "GB17244" "Group5.28" 101265 101547 1
>> "8" "GB17244" "Group5.28" 101632 101673 1
>> "9" "GB12201" "Group13.1" 374539 374712 -1
>> "10" "GB12201" "Group13.1" 366611 366778 -1
>> "11" "GB12201" "Group13.1" 366400 366513 -1
>> "12" "GB12201" "Group13.1" 366115 366201 -1
>> "13" "GB16402" "GroupUn.2" 208099 208254 1
>> Do you know how to convert it into GRanges object.
>
> Interesting names from chromosomes ... are they really "GBXXXX"?
>
> Anyway, assuming they are, this is how you can convert that table into
> a GRanges object. Let's assume the table you listed above is in a
> data.frame called `xcripts`
>
> R> gr.exons <- with(xcripts, {
>   GRanges(chromosome_name,
>     IRanges(exon_chrom_start, exon_chrom_end),
>     ifelse(strand == 1, '+', '-'))
> })
>
> HTH,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



More information about the Bioconductor mailing list