[BioC] extract introns

Yating Cheng yating.cheng at charite.de
Tue Nov 15 18:19:53 CET 2011


Hi,

I checked the function"gaps". But I am not sure how to use it. Is someone
know how to use it to get the positions of introns?

Thank you.

Yating


> Hi,
>
> On Sun, Nov 13, 2011 at 9:46 AM, Yating Cheng <yating.cheng at charite.de>
> wrote:
>> Hi, everyone,
>>
>> Now I am facing new problem.
>>
>> The following file I got from biomart. So, GB*** is ensemble_gene_id.and
>> Group*** is chromosome_name.
>>
>> The first problem is that, in the file is exon_chrom_start and end, but
>> should I need intron_start and end ? since I need intron sequences.
>
> The `gaps` function will help with that ... however, if there are
> intron start/end columns you can fetch from biomart, feel free to go
> that way ...
>
>> "ensembl_gene_id" "chromosome_name" "exon_chrom_start" "exon_chrom_end"
>> "strand"
>> "1" "GB17244" "Group5.28" 95590 95776 1
>> "2" "GB17244" "Group5.28" 95865 96055 1
>> "3" "GB17244" "Group5.28" 96189 96428 1
>> "4" "GB17244" "Group5.28" 100525 100632 1
>> "5" "GB17244" "Group5.28" 100689 100751 1
>> "6" "GB17244" "Group5.28" 100835 101145 1
>> "7" "GB17244" "Group5.28" 101265 101547 1
>> "8" "GB17244" "Group5.28" 101632 101673 1
>> "9" "GB12201" "Group13.1" 374539 374712 -1
>> "10" "GB12201" "Group13.1" 366611 366778 -1
>>
>> And another prolem is in the script.
>>
>> library(Biostrings)
>>
>> library(GenomicRanges)
>>
>>  source("http://ftp:/biocLite.R")
>>    biocLite("BSgenome.Amellifera.BeeBase.assembly4")
>>
>> library(BSgenome.Amellifera.BeeBase.assembly4)
>>  library(BSgenome.Amellifera.BeeBase.assembly4)
>> #Loading required package: BSgenome
>> #Warning message:
>> #package 'BSgenome' was built under R version 2.13.2
>>
>> intron_ranges<-read.table("intron_ranges.txt")
>>
>> gr.exons <- with(intron_ranges, {
>>  GRanges(chromosome_name,
>>   IRanges(exon_chrom_start, exon_chrom_end),
>>   ifelse(strand == 1, '+', '-'))})
>>
>> intron.seqs <-getSeq(Amellifera, gr.exons, as.character=FALSE)
>>
>> #Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA,
>> width,  :   sequence Group5.28 not found
>>
>> My guess is that the chromosome names I input is not the same stored in
>> the BS genome.
>
> You don't have to guess, what is the output of
> `seqlevels(Amellifera)`? You'll get a list of chromosome names there.
>
> It's your job to translate your "Group5.28" names to the ones listed
> in the Amellifera genome.
>
>> But that is the only way I can get the IRanges. Or does
>> anyone know how to get IRanges direct from BS genome.
>
> You can extract one chromosome at a time from the BSgenome object,
> then just query it w/ an IRanges object.
>
> For example, in human:
>
> R> library(BSgenome.Hsapiens.UCSC.hg19)
> R> chr1 <- unmasked(Hsapiens$chr1)
> R> some.seqs <- Views(chr1, IRanges(sample(249250600, 10), width=10))
> R> some.seqs
> some.seqs
>   Views on a 249250621-letter DNAString subject
> subject:
> NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> views:
>          start       end width
>  [1]  12274027  12274036    10 [AAAGAGCAAC]
>  [2] 149890737 149890746    10 [AAAGCCATAA]
>  [3]  49569412  49569421    10 [AAATTTGGAT]
>  [4] 230817984 230817993    10 [TTAGATTTTG]
>  [5]  23827107  23827116    10 [TTTCCTGTAT]
>  [6] 126211483 126211492    10 [NNNNNNNNNN]
>  [7] 230032024 230032033    10 [GTGAATAATT]
>  [8] 182185491 182185500    10 [TGCATTTGTC]
>  [9] 229816410 229816419    10 [TTCCTTAATG]
> [10]  99265270  99265279    10 [CACATTTTCT]
>
> HTH,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



More information about the Bioconductor mailing list